On Fri, Jun 9, 2017 at 10:58 AM, Linda Knippers <[email protected]> wrote:
>
>
> On 6/9/2017 1:46 PM, Dan Williams wrote:
>>
>> On Fri, Jun 9, 2017 at 9:34 AM, Linda Knippers <[email protected]>
>> wrote:
>>>
>>> This patch adds a new interface to provide additional Health Status
>>> Detail.
>>> This information is reported as part of the Smart Health with the HPE1
>>> DSM family so the function for the other families is NULL by default.  If
>>> the field is available, the ndctl --health option will decode
>>> the bits that make up the field.  If the DSM family doesn't support
>>> this function, no additional information is provided.
>>>
>>> With this change a healthy NVDIMM-N that supports this information
>>> would report something like this:
>>>
>>> {
>>>     "dev":"nmem6",
>>>     "id":"802c-0f-1612-122eb278",
>>>     "health":{
>>>       "health_state":"ok",
>>>       "temperature_celsius":25.000000,
>>>       "spares_percentage":99,
>>>       "alarm_temperature":false,
>>>       "alarm_spares":false,
>>>       "temperature_threshold":50.000000,
>>>       "spares_threshold":20,
>>>       "life_used_percentage":0,
>>>       "shutdown_state":"clean",
>>>       "health_status_detail":[
>>>         "ok"
>>>       ]
>>>     }
>>>   }
>>>
>>> An ailing NVDIMM-N could report one or more health status
>>> conditions, sometime like this:
>>>
>>> {
>>>     "dev":"nmem6",
>>>     "id":"802c-0f-1612-122eb278",
>>>     "health":{
>>>       "health_state":"ok",
>>>       "temperature_celsius":25.000000,
>>>       "spares_percentage":99,
>>>       "alarm_temperature":false,
>>>       "alarm_spares":false,
>>>       "temperature_threshold":50.000000,
>>>       "spares_threshold":20,
>>>       "life_used_percentage":0,
>>>       "shutdown_state":"clean",
>>>       "health_status_detail":[
>>>         "energy_source_error",
>>>         "arm_error",
>>>       ]
>>>     }
>>>   }
>>
>>
>> This format for health_detail makes the json harder to consume for
>> upper-layer applications. Every field should be associated with a
>> key-id even if it's just a flag.
>
>
> Ok.  I thought we had discussed this format with my RFC patch a
> while back.  The only change I made was adding underscores instead
> of spaces as requested.

Yeah, I thought I had brought up the array of json-boolean objects vs
json-string objects before.

> The items in the array are all associated with a single validation flag.
> There are some additional groups of fields that I plan to add as well,
> like a group of error counters, so do you want everything flat?

Yes, after writing some scripts to parse json I think flatter is
better unless the data itself is actually hierarchical.

>
>> I think I'd like to keep the json
>> topology flatter. We already have support for the NFIT health state
>> flags in this form:
>>
>>         {
>>           "provider":"nfit_test.1",
>>           "dev":"ndbus3",
>>           "dimms":[
>>             {
>>               "dev":"nmem5",
>>               "id":"cdab-0a-07e0-fffffeff",
>>               "flag_failed_save":true,
>>               "flag_failed_arm":true,
>>               "flag_failed_restore":true,
>>               "flag_smart_event":true,
>>               "flag_failed_flush":true
>>             }
>>           ]
>>         }
>>
>> So, let's just add additional flag names and omit the ones that are
>> duplicates of the NFIT-level health state flags.
>
>
> None of them are duplicates because the NFIT state flags are static from
> boot
> time while the health status detail changes at runtime.

Yes, but it's a pain for applications that now have to deal with two
different flag names for similar indicators. So for example if
"flag_failed_arm" is false at startup, but "arm_error" triggers later
I would think that would just be reflected in "flag_failed_arm". In
other words override the static flag_failed_arm value with the dynamic
result so that monitor applications need only learn one json key for
the same data.
_______________________________________________
Linux-nvdimm mailing list
[email protected]
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to