Clear Error Counter : Add clear-error-counter command to DRM RAS to clear
a specific error counter of a node. Implement the callback in XE driver
to demonstrate usage.
Usage with both get-error-counter and clear-error-counter:
$ sudo ynl --family drm_ras --dump get-error-counter --json '{"node-id":1}'
[{'error-id': 1, 'error-name': 'core-compute', 'error-value': 0},
{'error-id': 2, 'error-name': 'soc-internal', 'error-value': 3}]
$ sudo ynl --family drm_ras --do clear-error-counter --json \
'{"node-id":1, "error-id":2}'
None
$ sudo ynl --family drm_ras --dump get-error-counter --json '{"node-id":1}'
[{'error-id': 1, 'error-name': 'core-compute', 'error-value': 0},
{'error-id': 2, 'error-name': 'soc-internal', 'error-value': 0}]
Error Event Support: Introduce `error-event` support in DRM RAS to notify
userspace whenever an error occurs.
Each notification includes the node-id and error-id to identify
the source and type of the error. To receive notifications,
userspace must subscribe to the 'error-notify' multicast group.
Userspace can receive the event by subscribing to multicast group.
$ sudo ynl --family drm_ras --subscribe error-notify
{'msg': {'error-id': 2, 'node-id': 1}, 'name': 'error-event'}
Riana Tauro (4):
drm/drm_ras: Add clear-error-counter netlink command to drm_ras
drm/xe/xe_drm_ras: Add support for clear-error-counter in XE DRM RAS
drm/drm_ras: Add DRM RAS netlink error event notification
drm/xe/xe_drm_ras: Add error-event support in XE DRM RAS
Documentation/gpu/drm-ras.rst | 17 +++++
Documentation/netlink/specs/drm_ras.yaml | 27 ++++++-
drivers/gpu/drm/drm_ras.c | 91 +++++++++++++++++++++++-
drivers/gpu/drm/drm_ras_nl.c | 19 +++++
drivers/gpu/drm/drm_ras_nl.h | 6 ++
drivers/gpu/drm/xe/xe_drm_ras.c | 52 +++++++++++++-
drivers/gpu/drm/xe/xe_drm_ras.h | 7 ++
drivers/gpu/drm/xe/xe_hw_error.c | 5 ++
include/drm/drm_ras.h | 13 ++++
include/uapi/drm/drm_ras.h | 4 ++
10 files changed, 237 insertions(+), 4 deletions(-)
--
2.47.1