On 6/9/2017 3:06 PM, Dan Williams wrote:
On Fri, Jun 9, 2017 at 11:56 AM, Linda Knippers <[email protected]> wrote:
On 6/9/2017 2:10 PM, Dan Williams wrote:

This format for health_detail makes the json harder to consume for
upper-layer applications. Every field should be associated with a
key-id even if it's just a flag.



Ok.  I thought we had discussed this format with my RFC patch a
while back.  The only change I made was adding underscores instead
of spaces as requested.


Yeah, I thought I had brought up the array of json-boolean objects vs
json-string objects before.


Hmm...maybe I missed that, but ok.


The items in the array are all associated with a single validation flag.
There are some additional groups of fields that I plan to add as well,
like a group of error counters, so do you want everything flat?


Yes, after writing some scripts to parse json I think flatter is
better unless the data itself is actually hierarchical.


I think I'd like to keep the json
topology flatter. We already have support for the NFIT health state
flags in this form:

        {
          "provider":"nfit_test.1",
          "dev":"ndbus3",
          "dimms":[
            {
              "dev":"nmem5",
              "id":"cdab-0a-07e0-fffffeff",
              "flag_failed_save":true,
              "flag_failed_arm":true,
              "flag_failed_restore":true,
              "flag_smart_event":true,
              "flag_failed_flush":true
            }
          ]
        }

So, let's just add additional flag names and omit the ones that are
duplicates of the NFIT-level health state flags.



None of them are duplicates because the NFIT state flags are static from
boot
time while the health status detail changes at runtime.


Yes, but it's a pain for applications that now have to deal with two
different flag names for similar indicators. So for example if
"flag_failed_arm" is false at startup, but "arm_error" triggers later
I would think that would just be reflected in "flag_failed_arm". In
other words override the static flag_failed_arm value with the dynamic
result so that monitor applications need only learn one json key for
the same data.


But they're different things.  One represents boot time state and one
represents
runtime.  One is an NFIT flag, not even reported as health today, and one is
runtime health.  Even your commit message says they're distinct from SMART
health data.  Knowing that my NVDIMM came up healthy and then had an
 issue,
or vice versa, is important.

I agree, but I don't think it's ndctl's job to maintain that
distinction. The default case is "I want to know the present state of
my dimm", if an environment wants to know "has my state changed over
time" then it should be logging the health state snapshots and
comparing them.

If you want to know the state of the NVDIMM, that's the health information
for systems that provide this information.  Right now for the information
we're talking about, that's systems that support the HPE1 DSM family.

It is different from the NFIT device flag, which represents the state of
the NVDIMM at the point at which the FW created the NFIT table, which would
apply to all NFIT-commpliant platforms.

If you try to combine the two, it's not clear what state you're getting
because it's not obvious from ndctl whether it's static or dynamic data.
I believe that's why you correctly didn't call the state flags "health"
data.

Maybe the state flags should be renamed to reflect what they really are.
The "true" value for those is probably redundant too because I don't think

Yes, the 'true' value is redundant. The goal is to get the flag name
into a json-key rather than a json-value.

you
output anything if they're fault, although now I'm not sure those NFIT
flags are being displayed properly at all.  On my system, /sys../nfit/flags
shows "smart_notify", which I assume means I should see
"flag_smart_event":true,
but I don't.  I'll look at that separately.

The 'smart_notify' flag is whether the DIMM supports health state
notifications, the 'flag_smart_event' indicates if a smart event was
pending. So your system supports notifications and the NFIT says no
SMART event was pending.

Ok, so you display all the nfit flags but one.  The name is a bit unclear.
And yes, I should have reviewed the patch in April.  I'm happy to post a
patch to make the names a little clearer now.

-- ljk
_______________________________________________
Linux-nvdimm mailing list
[email protected]
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to