On Tue, Aug 05, 2025 at 10:57:24AM -0400, Rodrigo Vivi wrote: > On Mon, Jul 28, 2025 at 03:57:51PM +0530, Riana Tauro wrote: > > Address the need for a recovery method (firmware flash on Firmware errors) > > introduced in the later patches of Xe KMD. > > Whenever XE KMD detects a firmware error, a firmware flash is required to > > recover the device to normal operation. > > > > The initial proposal to use 'firmware-flash' as a recovery method was > > not applicable to other drivers and could cause multiple recovery > > methods specific to vendors to be added. > > To address this a more generic 'vendor-specific' method is introduced, > > guiding users to refer to vendor specific documentation and system logs > > for detailed vendor specific recovery procedure. > > > > Add a recovery method 'WEDGED=vendor-specific' for such errors. > > Vendors must provide additional recovery documentation if this method > > is used. > > > > It is the responsibility of the consumer to refer to the correct vendor > > specific documentation and usecase before attempting a recovery. > > > > For example: If driver is XE KMD, the consumer must refer > > to the documentation of 'Device Wedging' under 'Documentation/gpu/xe/'. > > > > Recovery script contributed by Raag. > > > > v2: fix documentation (Raag) > > v3: add more details to commit message (Sima, Rodrigo, Raag) > > add an example script to the documentation (Raag) > > v4: use consistent naming (Raag) > > v5: fix commit message > > > > Cc: André Almeida <andrealm...@igalia.com> > > Cc: Christian König <christian.koe...@amd.com> > > Cc: David Airlie <airl...@gmail.com> > > Cc: Simona Vetter <simona.vet...@ffwll.ch> > > Cc: Maxime Ripard <mrip...@kernel.org> > > > Co-developed-by: Raag Jadav <raag.ja...@intel.com> > > Signed-off-by: Raag Jadav <raag.ja...@intel.com> > > Signed-off-by: Riana Tauro <riana.ta...@intel.com> > > Reviewed-by: Rodrigo Vivi <rodrigo.v...@intel.com> > > --- > > Documentation/gpu/drm-uapi.rst | 42 ++++++++++++++++++++++++++++------ > > drivers/gpu/drm/drm_drv.c | 2 ++ > > include/drm/drm_device.h | 4 ++++ > > 3 files changed, 41 insertions(+), 7 deletions(-) > > > > diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst > > index 843facf01b2d..5691b29acde3 100644 > > --- a/Documentation/gpu/drm-uapi.rst > > +++ b/Documentation/gpu/drm-uapi.rst > > @@ -418,13 +418,15 @@ needed. > > Recovery > > -------- > > > > -Current implementation defines three recovery methods, out of which, > > drivers > > +Current implementation defines four recovery methods, out of which, drivers > > can use any one, multiple or none. Method(s) of choice will be sent in the > > uevent environment as ``WEDGED=<method1>[,..,<methodN>]`` in order of less > > to > > -more side-effects. If driver is unsure about recovery or method is unknown > > -(like soft/hard system reboot, firmware flashing, physical device > > replacement > > -or any other procedure which can't be attempted on the fly), > > ``WEDGED=unknown`` > > -will be sent instead. > > +more side-effects. If recovery method is specific to vendor > > +``WEDGED=vendor-specific`` will be sent and userspace should refer to > > vendor > > +specific documentation for the recovery procedure. As an example if the > > driver > > +is 'Xe' then the documentation for 'Device Wedging' of Xe driver needs to > > be > > +referred for the recovery procedure. If driver is unsure about recovery or > > +method is unknown, ``WEDGED=unknown`` will be sent instead. > > What if instead of this we do something like: > > --- a/Documentation/gpu/drm-uapi.rst > +++ b/Documentation/gpu/drm-uapi.rst > @@ -441,6 +441,29 @@ following expectations. > unknown consumer policy > =============== ======================================== > > +Vendor-Specific Recovery > +++++++++++++++++++++++++ > + > +When ``WEDGED=vendor-specific`` is emitted, it indicates that the device > requires a > +recovery method that is *not standardized* and is specific to the hardware > vendor. > + > +In this case, the vendor driver must provide detailed documentation > describing > +every single recovery possibilities and its processes. It needs to include: > + > +- Hints: Which of the following will be used to identify the > + specific device, and guide the administrator: > + + Sysfs, debugfs, tracepoints, or kernel logs (e.g., ``dmesg``) > +- Explicit guidance: for any admin or userspace tools and scripts necessary > + to carry out recovery. > + > +**Example**: > + If the device uses the ``Xe`` driver, then administrators should consult > the > + *"Device Wedging"* section of the Xe driver's documentation to determine > + the proper steps for recovery. > + > +Notes > ++++++ > + > The only exception to this is ``WEDGED=none``, which signifies that the > device > > ---------------------- > > Maxime, is it any better?
Yes, it is. Thanks! Maxime
signature.asc
Description: PGP signature