Hi Dan and Michal,

I have posted an RFC patch to implment the kernel side interface for
this in libnvdimm with an implementation in papr-scm driver module at [1]. Can
you please take a look at the patch seried and provide your inputs.

[1] 
https://lore.kernel.org/linux-nvdimm/[email protected]/

Thanks,
~ Vaibhav

Dan Williams <[email protected]> writes:

> On Fri, Oct 23, 2020 at 10:28 AM Michal Suchánek <[email protected]> wrote:
>>
>> Hello,
>>
>> On Thu, May 28, 2020 at 11:59 AM Vaibhav Jain <[email protected]> wrote:
>> >
>> > Thanks for this taking time to look into this Dan,
>> >
>> > Agree with the points you have made earlier that I am summarizing below:
>> >
>> > * This is better done in ndctl rather than ipmctl.
>> > * Should only expose general performance metrics and not performance
>> >   counters. Performance counter should be exposed via perf
>> > * Vendor specific metrics to be separated from generic performance
>> >   metrics.
>> >
>> > One way to split generic and vendor specific metrics might be to report
>> > generic performance metrics together with dimm health metrics such as
>> > "temprature_celsius" or "spares_percentage" that are already reported in
>> > by dimm health output.
>> >
>> > Vendor specific performance metrics can be reported as a seperate object
>> > in the json output. Something similar to output below:
>> >
>> > # ndctl list -DH --stats --vendor-stats
>> > [
>> >   {
>> >     "dev":"nmem0",
>> >     "health":{
>> >       "health_state":"ok",
>> >       "shutdown_state":"clean",
>> >       "temperature_celsius":48.00,
>> >       "spares_percentage":10,
>> >
>> >       /* Generic performance metrics/stats */
>> >       "TotalMediaReads": 18929,
>> >       "TotalMediaWrites": 0,
>> >       ....
>> >     }
>> >
>> >     /* Vendor specific stats for the dimm */
>> >     "vendor-stats": {
>> >     "Controller Reset Count":10
>> >     "Controller Reset Elapsed Time": 3600
>> >     "Power-on Seconds": 3600
>>
>> How do you tell generic from vendor-specific stats, though?
>>
>> Controller reset count and power-on time may not be reported by some
>> controllers but sound pretty generic.
>>
>> Even if you declare that the stats reported by all controllers
>> available at this moment are generic a later one may not report some of
>> these 'generic' statistics, or report them in different way/units, or
>> may simply not report anything at all for some technical reason.
>>
>> Kernels that do not have this feature will not report anything at all
>> either.
>
> My expectation is that for a given json attribute name any vendor
> backend that supports it must convey it in a compatible way. If a
> given attribute does not make sense for a given vendor, or is not yet
> implemented then leaving it unpopulated is indeed the expectation.
>
> The goal is to both minimize vendor specific logic in infrastructure
> that consumes the ndctl json while at the same time balance vendor
> needs. In other words avoid "needless" differentiation as much as
> possible with small amount of compat work across vendors.
_______________________________________________
Linux-nvdimm mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to