On Fri, Jan 09, 2026 at 09:13:31AM -0500, Rodrigo Vivi wrote:
> On Fri, Jan 09, 2026 at 01:38:44PM +0530, Riana Tauro wrote:
> > Hi Raag
> > 
> > Thank you for the review
> > 
> > On 12/9/2025 1:52 PM, Raag Jadav wrote:
> > > On Fri, Dec 05, 2025 at 02:09:34PM +0530, Riana Tauro wrote:
> > > > Allocate correctable, nonfatal and fatal nodes per xe device.
> > > > Each node contains error classes, counters and respective
> > > > query counter functions.
> > > > 
> > > > Add basic functionality to create and register drm nodes.
> > > > Below operations can be performed using Generic netlink DRM RAS 
> > > > interface

...

> > > > Query Error counter:
> > > > 
> > > > $ sudo ynl --family drm_ras --do query-error-counter  --json 
> > > > '{"node-id":1, "error-id":1}'
> > > > {'error-id': 1, 'error-name': 'Core Compute Error', 'error-value': 0}
> > > 
> > > One more (sorry): So this means graphics will be a different id? Or do 
> > > they
> > > overlap? How does it work?
> > > 
> > 
> > Did not get this question.

This give the impression that it's specific to compute engine, so I was
hoping for something more generic like "execution unit" or simply "core"
but I couldn't come up with anything better than this, so upto you.

> > > Also,
> > > 
> > > [*] I'm not much informed about the history here but the 'error' term
> > > seems slapped onto almost everything. We already know it's RAS so perhaps
> > > we add it only where make sense and try to simplify some of the naming?

...

> > > > +/**
> > > > + * enum drm_xe_ras_error_class - Supported drm ras error classes.
> > > > + */
> > > > +enum drm_xe_ras_error_class {
> > > > +       /** @DRM_XE_RAS_ERROR_CORE_COMPUTE: GT and Media Error */
> > > > +       DRM_XE_RAS_ERROR_CORE_COMPUTE = 1,
> > > > +       /** @DRM_XE_RAS_ERROR_SOC_INTERNAL: SOC Error */
> > > > +       DRM_XE_RAS_ERROR_SOC_INTERNAL,
> > > > +       /** @DRM_XE_RAS_ERROR_CLASS_MAX: Max Error */
> > > > +       DRM_XE_RAS_ERROR_CLASS_MAX,     /* non-ABI */
> > > > +};
> > > 
> > > Also, all of the enums share the same DRM_XE_RAS_ERROR_* prefix, so let's 
> > > try
> > > to have distinguishable naming. Perhaps [*] would be useful here as well 
> > > ;)
> > 
> > DRM_XE_RAS_ERROR_SEVERITY_* will cause longer names. Any suggestions?

Already mentioned above[*], the key is to not overuse 'error' ;)

DRM_XE_RAS_SEVERITY_*
DRM_XE_RAS_COMPONENT_*

and so on ...

> Try this full version first and see how the outcome looks like...
> if we are still respecting the line limits without ugly cuts, then let's go 
> with it.
> otherwise try something shorter ERR_SEV_ ... or something like that...

... which can be futher shortened with this idea.

Side note: I'm already using these on my local branch.

Raag

Reply via email to