On Fri, Jun 9, 2017 at 10:58 AM, Linda Knippers <[email protected]> wrote:
>
>
> On 6/9/2017 1:46 PM, Dan Williams wrote:
>>
>> On Fri, Jun 9, 2017 at 9:34 AM, Linda Knippers <[email protected]>
>> wrote:
>>>
>>> This patch adds a new interface to provide additional Health Status
>>> Detail.
>>> This information is reported as part of the Smart Health with the HPE1
>>> DSM family so the function for the other families is NULL by default. If
>>> the field is available, the ndctl --health option will decode
>>> the bits that make up the field. If the DSM family doesn't support
>>> this function, no additional information is provided.
>>>
>>> With this change a healthy NVDIMM-N that supports this information
>>> would report something like this:
>>>
>>> {
>>> "dev":"nmem6",
>>> "id":"802c-0f-1612-122eb278",
>>> "health":{
>>> "health_state":"ok",
>>> "temperature_celsius":25.000000,
>>> "spares_percentage":99,
>>> "alarm_temperature":false,
>>> "alarm_spares":false,
>>> "temperature_threshold":50.000000,
>>> "spares_threshold":20,
>>> "life_used_percentage":0,
>>> "shutdown_state":"clean",
>>> "health_status_detail":[
>>> "ok"
>>> ]
>>> }
>>> }
>>>
>>> An ailing NVDIMM-N could report one or more health status
>>> conditions, sometime like this:
>>>
>>> {
>>> "dev":"nmem6",
>>> "id":"802c-0f-1612-122eb278",
>>> "health":{
>>> "health_state":"ok",
>>> "temperature_celsius":25.000000,
>>> "spares_percentage":99,
>>> "alarm_temperature":false,
>>> "alarm_spares":false,
>>> "temperature_threshold":50.000000,
>>> "spares_threshold":20,
>>> "life_used_percentage":0,
>>> "shutdown_state":"clean",
>>> "health_status_detail":[
>>> "energy_source_error",
>>> "arm_error",
>>> ]
>>> }
>>> }
>>
>>
>> This format for health_detail makes the json harder to consume for
>> upper-layer applications. Every field should be associated with a
>> key-id even if it's just a flag.
>
>
> Ok. I thought we had discussed this format with my RFC patch a
> while back. The only change I made was adding underscores instead
> of spaces as requested.
Yeah, I thought I had brought up the array of json-boolean objects vs
json-string objects before.
> The items in the array are all associated with a single validation flag.
> There are some additional groups of fields that I plan to add as well,
> like a group of error counters, so do you want everything flat?
Yes, after writing some scripts to parse json I think flatter is
better unless the data itself is actually hierarchical.
>
>> I think I'd like to keep the json
>> topology flatter. We already have support for the NFIT health state
>> flags in this form:
>>
>> {
>> "provider":"nfit_test.1",
>> "dev":"ndbus3",
>> "dimms":[
>> {
>> "dev":"nmem5",
>> "id":"cdab-0a-07e0-fffffeff",
>> "flag_failed_save":true,
>> "flag_failed_arm":true,
>> "flag_failed_restore":true,
>> "flag_smart_event":true,
>> "flag_failed_flush":true
>> }
>> ]
>> }
>>
>> So, let's just add additional flag names and omit the ones that are
>> duplicates of the NFIT-level health state flags.
>
>
> None of them are duplicates because the NFIT state flags are static from
> boot
> time while the health status detail changes at runtime.
Yes, but it's a pain for applications that now have to deal with two
different flag names for similar indicators. So for example if
"flag_failed_arm" is false at startup, but "arm_error" triggers later
I would think that would just be reflected in "flag_failed_arm". In
other words override the static flag_failed_arm value with the dynamic
result so that monitor applications need only learn one json key for
the same data.
_______________________________________________
Linux-nvdimm mailing list
[email protected]
https://lists.01.org/mailman/listinfo/linux-nvdimm