guc: Expose raw access to GuC log over debugfs

John Harrison Tue, 14 May 2024 11:13:32 -0700

On 5/14/2024 07:58, Michal Wajdeczko wrote:

On 13.05.2024 18:53, John Harrison wrote:

On 5/12/2024 08:36, Michal Wajdeczko wrote:

We already provide the content of the GuC log in debugsfs, but it
is in a text format where each log dword is printed as hexadecimal
number, which does not scale well with large GuC log buffers.


To allow more efficient access to the GuC log, which could benefit
our CI systems, expose raw binary log data.  In addition to less
overhead in preparing text based GuC log file, the new GuC log file
in binary format is also almost 3x smaller.

Any existing script that expects the GuC log buffer in text format
can use command like below to convert from new binary format:

     hexdump -e '4/4 "0x%08x " "\n"'

but this shouldn't be the case as most decoders expect GuC log data
in binary format.

I strongly disagree with this.

Efficiency and file size is not an issue when accessing the GuC log via
debugfs on actual hardware.

to some extend it is as CI team used to refuse to collect GuC logs after
each executed test just because of it's size

I've never heard that argument. I've heard many different arguments butnot one about file size. The default GuC log size is pretty tiny. Sosize really is not an issue.

It is an issue when dumping via dmesg but
you definitely should not be dumping binary data to dmesg. Whereas,

not following here - this is debugfs specific, not a dmesg printer

Except that it is preferable to have common code for both if at allpossible.

dumping in binary data is much more dangerous and liable to corruption
because some tool along the way tries to convert to ASCII, or truncates
at the first zero, etc. We request GuC logs be sent by end users,
customer bug reports, etc. all doing things that we have no control over.

hmm, how "cp gt0/uc/guc_log_raw FILE" could end with a corrupted file ?

Because someone then tries to email it, or attach it or copy it viaWindows or any number of other ways in which a file can get munged.

Converting the hexdump back to binary is trivial for those tools which
require it. If you follow the acquisition and decoding instructions on
the wiki page then it is all done for you automatically.

I'm afraid I don't know where this wiki page is, but I do know that hex
conversion dance is not needed for me to get decoded GuC log the way I
used to do

Look for the 'GuC Debug Logs' page on the developer wiki. It's prettyeasy to find.

These patches are trying to solve a problem which does not exist and are
going to make working with GuC logs harder and more error prone.

it at least solves the problem of currently super inefficient way of
generating the GuC log in text format.

it also opens other opportunities to develop tools that could monitor or
capture GuC log independently on  top of what driver is able to offer
today (on i915 there was guc-log-relay, but it was broken for long time,
not sure what are the plans for Xe)

also still not sure how it can be more error prone.

As already explained, the plan is move to LFD - an extensible,streamable, logging format. Any non-trivial effort that is not helpingto move to LFD is not worth the effort.

On the other hand, there are many other issues with GuC logs that it
would be useful to solves - including extra meta data, reliable output
via dmesg, continuous streaming, pre-sizing the debugfs file to not have
to generate it ~12 times for a single read, etc.

this series actually solves last issue but in a bit different way (we
even don't need to generate full GuC log dump at all if we would like to
capture only part of the log if we know where to look)

No, it doesn't solve it. Your comment below suggests it will be read in4KB chunks. Which means your 16MB buffer now requires 4096 separatereads! And you only doing partial reads of the section you think youneed is never going to be reliable on live system. Not sure why youwould want to anyway. It is just making things much more complex. Younow need an intelligent user land program to read the log out and decodeat least the header section to know what data section to read. You can'tjust dump the whole thing with 'cat' or 'dd'.


for reliable output via dmesg - see my proposal at [1]

[1] https://patchwork.freedesktop.org/series/133613/

Hmm. Actually, is this interface allowing the filesystem layers to issue
multiple read calls to read the buffer out in small chunks? That is also
going to break things. If the GuC is still writing to the log as the
user is reading from it, there is the opportunity for each chunk to not
follow on from the previous chunk because the data has just been
overwritten. This is already a problem at the moment that causes issues
when decoding the logs, even with an almost atomic copy of the log into
a temporary buffer before reading it out. Doing the read in separate
chunks is only going to make that problem even worse.

current solution, that converts data into hex numbers, reads log buffer
in chunks of 128 dwords, how proposed here solution that reads in 4K
chunks could be "even worse" ?

See above, 4KB chunks means 4096 separate reads for a 16M buffer. Andeach one of those reads is a full round trip to user land and back. Ifyou want to get at all close to an atomic read of the log then it needsto be done as a single call that copies the log into a locally allocatedkernel buffer and then allows user land to read out from that bufferrather than from the live log. Which can be trivially done with thecurrent method (at the expense of a large memory allocation) but wouldbe much more difficult with random access reader like this as you wouldneed to say the copied buffer around until the reads have all been done.Which would presumably mean adding open/close handlers to allocate andfree that memory.


and in case of some smart tool, that would understands the layout of the
GuC log buffer, we can even fully eliminate problem of reading stale
data, so why not to choose a more scalable solution ?

You cannot eliminate the problem of stale data. You read the header, youread the data it was pointing to, you re-read the header and find thatthe GuC has moved on. That is an infinite loop of continuously updatingpointers.


John.

John.

Signed-off-by: Michal Wajdeczko <michal.wajdec...@intel.com>
Cc: Lucas De Marchi <lucas.demar...@intel.com>
Cc: John Harrison <john.c.harri...@intel.com>
---
Cc: linux-fsde...@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
---
   drivers/gpu/drm/xe/xe_guc_debugfs.c | 26 ++++++++++++++++++++++++++
   1 file changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_debugfs.c
b/drivers/gpu/drm/xe/xe_guc_debugfs.c
index d3822cbea273..53fea952344d 100644
--- a/drivers/gpu/drm/xe/xe_guc_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_guc_debugfs.c
@@ -8,6 +8,7 @@
   #include <drm/drm_debugfs.h>
   #include <drm/drm_managed.h>
   +#include "xe_bo.h"
   #include "xe_device.h"
   #include "xe_gt.h"
   #include "xe_guc.h"
@@ -52,6 +53,29 @@ static const struct drm_info_list debugfs_list[] = {
       {"guc_log", guc_log, 0},
   };
   +static ssize_t guc_log_read(struct file *file, char __user *buf,
size_t count, loff_t *pos)
+{
+    struct dentry *dent = file_dentry(file);
+    struct dentry *uc_dent = dent->d_parent;
+    struct dentry *gt_dent = uc_dent->d_parent;
+    struct xe_gt *gt = gt_dent->d_inode->i_private;
+    struct xe_guc_log *log = &gt->uc.guc.log;
+    struct xe_device *xe = gt_to_xe(gt);
+    ssize_t ret;
+
+    xe_pm_runtime_get(xe);
+    ret = xe_map_read_from(xe, buf, count, pos, &log->bo->vmap,
log->bo->size);
+    xe_pm_runtime_put(xe);
+
+    return ret;
+}
+
+static const struct file_operations guc_log_ops = {
+    .owner        = THIS_MODULE,
+    .read        = guc_log_read,
+    .llseek        = default_llseek,
+};
+
   void xe_guc_debugfs_register(struct xe_guc *guc, struct dentry *parent)
   {
       struct drm_minor *minor = guc_to_xe(guc)->drm.primary;
@@ -72,4 +96,6 @@ void xe_guc_debugfs_register(struct xe_guc *guc,
struct dentry *parent)
       drm_debugfs_create_files(local,
                    ARRAY_SIZE(debugfs_list),
                    parent, minor);
+
+    debugfs_create_file("guc_log_raw", 0600, parent, NULL,
&guc_log_ops);
   }

Re: [PATCH 4/4] drm/xe/guc: Expose raw access to GuC log over debugfs

Reply via email to