This set of patches adds support for the HPE SMART DSM functions and enables
ndctl to report DIMM health data for HPE type N NVDIMMs.  The relevant
firmware interfaces are described in [1].

The first patch virtualizes the ndctl_cmd_smart*() family of libndctl
interfaces into a set of ndctl_smart_ops, allowing runtime implementation
differentiation depending on the firmware support provided by a DIMM.

The second and third patches add miscellaneous pieces needed for the
final patch:

The fourth patch adds a set of ndctl_smart_ops for the HPE1 DSM family,
based on the firmware interfaces defined in [1].  These ndctl_smart_ops
translate the HPE1 DSM output to match the interface of the existing
Intel DSM-inspired smart_ops.  This delivers health reporting parity for
HPE type N NVDIMMs, however:

When evaluating this ndctl_smart_ops approach, please consider our goal of
adding JSON exports for some of the additional health data defined in [1].
I expect this would entail adding additional accessor functions to
ndctl_smart_ops, but it's not clear whether or how to extend the existing
get_flags()/check flags/get_data() model used by util_dimm_health_to_json().

If you'd like to test these changes, note the following:

. Some of the DSM functions for HPE type N NVDIMMs, including the ones used
  by this patch, require the acpi_ipmi kernel module to be loaded, and you
  may need to manually modprobe it.

. Without [2], you'll need to include '--idle' in your ndctl invocation as
  ndctl will consider type Ns to be disabled and will otherwise omit them.

. Without [3], "alarm_temperature" and "alarm_spares" will be inaccurate.


Changes in v3:
  - Dropped the smart_dimm_op() macro and open coded
    ndctl_dimm_cmd_new_smart() and ndctl_dimm_cmd_new_smart_threshold().
    (suggested by Dan)
  - Do not fail add_dimm() if the read of nfit/commands fails, just set
    dimm->dsm_family to -1 instead (suggested by Dan).  As part of this,
    I also moved the read to after the has_nfit() check.
  - Added missing bibliography to the commit message for the last patch.
  - I realized that v2 wouldn't compile if HAS_SMART was not 1, so I
    changed intel_smart_ops and hpe1_smart_ops into pointers and set them
    to NULL if HAS_SMART != 1.  They are const because that seems to
    avoid compiler warnings about unused variables in cases where
    libndctl-private.h is included by something other than libndctl.c.

Changes in v2:
  New approach: taught libndctl how to translate between the HPE1 DSM
  family and the existing ndctl_cmd_smart*() libndctl interfaces
  (as suggested by Dan).

Brian Boylston (4):
  libndctl: introduce ndctl_smart_ops
  libndctl: record dsm family in add_dimm()
  libndctl: enable ND_CMD_CALL
  libndctl: add support for the HPE1 family of DSM SMART functions

 ndctl/            |   1 +
 ndctl/lib/libndctl-hpe1.c    | 303 ++++++++++++++++++++++++++++++++++++++
 ndctl/lib/libndctl-private.h |  25 ++++
 ndctl/lib/libndctl-smart.c   | 116 ++++++++++++---
 ndctl/lib/libndctl.c         |  18 ++-
 ndctl/lib/ndctl-hpe1.h       | 335 +++++++++++++++++++++++++++++++++++++++++++
 ndctl/          |   1 +
 7 files changed, 775 insertions(+), 24 deletions(-)
 create mode 100644 ndctl/lib/libndctl-hpe1.c
 create mode 100644 ndctl/lib/ndctl-hpe1.h


Linux-nvdimm mailing list

Reply via email to