Add support to handle firmware reported errors. When CSC firmware errors are encoutered, a error interrupt is received by the GFX device as a MSI interrupt.
Device Source control registers indicates the source of the error as CSC The HEC error status register indicates that the error is firmware reported Depending on the type of firmware error, the error cause is written to the HEC Firmware error register. On encountering such CSC firmware errors, the graphics device is non-recoverable from driver context. The only way to recover from these errors is firmware flash. Add a firmware flash method to the drm device wedged uevent. Send this method along with the uevent to notify userspace of the wedged state and the possible recovery method. $ udevadm monitor --property --kernel monitor will print the received events for: KERNEL - the kernel uevent KERNEL[754.709341] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 (drm) ACTION=change DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 SUBSYSTEM=drm WEDGED=firmware-flash DEVNAME=/dev/dri/card0 DEVTYPE=drm_minor SEQNUM=5973 MAJOR=226 MINOR=0 Bspec: 50875, 53073, 53074, 53075, 53076 Riana Tauro (4): drm: Add a firmware flash method to device wedged uevent drm/xe: Add a helper function to set recovery method drm/xe: Add support to handle hardware errors drm/xe/xe_hw_error: Handle CSC Firmware reported Hardware errors Documentation/gpu/drm-uapi.rst | 6 +- drivers/gpu/drm/drm_drv.c | 2 + drivers/gpu/drm/xe/Makefile | 1 + drivers/gpu/drm/xe/regs/xe_gsc_regs.h | 2 + drivers/gpu/drm/xe/regs/xe_hw_error_regs.h | 20 +++ drivers/gpu/drm/xe/regs/xe_irq_regs.h | 1 + drivers/gpu/drm/xe/xe_device.c | 30 +++- drivers/gpu/drm/xe/xe_device.h | 1 + drivers/gpu/drm/xe/xe_device_types.h | 5 + drivers/gpu/drm/xe/xe_hw_error.c | 171 +++++++++++++++++++++ drivers/gpu/drm/xe/xe_hw_error.h | 15 ++ drivers/gpu/drm/xe/xe_irq.c | 4 + include/drm/drm_device.h | 1 + 13 files changed, 249 insertions(+), 10 deletions(-) create mode 100644 drivers/gpu/drm/xe/regs/xe_hw_error_regs.h create mode 100644 drivers/gpu/drm/xe/xe_hw_error.c create mode 100644 drivers/gpu/drm/xe/xe_hw_error.h -- 2.47.1