On Fri, Jan 09, 2026 at 09:13:31AM -0500, Rodrigo Vivi wrote:
> On Fri, Jan 09, 2026 at 01:38:44PM +0530, Riana Tauro wrote:
> > Hi Raag
> >
> > Thank you for the review
> >
> > On 12/9/2025 1:52 PM, Raag Jadav wrote:
> > > On Fri, Dec 05, 2025 at 02:09:34PM +0530, Riana Tauro wrote:
> > > > Allocate correctable, nonfatal and fatal nodes per xe device.
> > > > Each node contains error classes, counters and respective
> > > > query counter functions.
> > > >
> > > > Add basic functionality to create and register drm nodes.
> > > > Below operations can be performed using Generic netlink DRM RAS
> > > > interface
...
> > > > Query Error counter:
> > > >
> > > > $ sudo ynl --family drm_ras --do query-error-counter --json
> > > > '{"node-id":1, "error-id":1}'
> > > > {'error-id': 1, 'error-name': 'Core Compute Error', 'error-value': 0}
> > >
> > > One more (sorry): So this means graphics will be a different id? Or do
> > > they
> > > overlap? How does it work?
> > >
> >
> > Did not get this question.
This give the impression that it's specific to compute engine, so I was
hoping for something more generic like "execution unit" or simply "core"
but I couldn't come up with anything better than this, so upto you.
> > > Also,
> > >
> > > [*] I'm not much informed about the history here but the 'error' term
> > > seems slapped onto almost everything. We already know it's RAS so perhaps
> > > we add it only where make sense and try to simplify some of the naming?
...
> > > > +/**
> > > > + * enum drm_xe_ras_error_class - Supported drm ras error classes.
> > > > + */
> > > > +enum drm_xe_ras_error_class {
> > > > + /** @DRM_XE_RAS_ERROR_CORE_COMPUTE: GT and Media Error */
> > > > + DRM_XE_RAS_ERROR_CORE_COMPUTE = 1,
> > > > + /** @DRM_XE_RAS_ERROR_SOC_INTERNAL: SOC Error */
> > > > + DRM_XE_RAS_ERROR_SOC_INTERNAL,
> > > > + /** @DRM_XE_RAS_ERROR_CLASS_MAX: Max Error */
> > > > + DRM_XE_RAS_ERROR_CLASS_MAX, /* non-ABI */
> > > > +};
> > >
> > > Also, all of the enums share the same DRM_XE_RAS_ERROR_* prefix, so let's
> > > try
> > > to have distinguishable naming. Perhaps [*] would be useful here as well
> > > ;)
> >
> > DRM_XE_RAS_ERROR_SEVERITY_* will cause longer names. Any suggestions?
Already mentioned above[*], the key is to not overuse 'error' ;)
DRM_XE_RAS_SEVERITY_*
DRM_XE_RAS_COMPONENT_*
and so on ...
> Try this full version first and see how the outcome looks like...
> if we are still respecting the line limits without ugly cuts, then let's go
> with it.
> otherwise try something shorter ERR_SEV_ ... or something like that...
... which can be futher shortened with this idea.
Side note: I'm already using these on my local branch.
Raag