[Cc: -Brett, -Nicholas (550 #5.1.0 Address rejected.)]

Am 04.12.24 um 13:34 schrieb Paul Menzel:
Dear Konrad,


Thank you for your patch. It’d be great if you made the commit message summary/title a statement by adding a verb (in imperative mood). Maybe:

ice: Support for fw and port health status


Am 04.12.24 um 13:27 schrieb Konrad Knitter:
Firmware generates events for global events or port specific events.

Driver shall subscribe for health status events from firmware on supported
FW versions >= 1.7.6.

Please add a blank line between paragraphs, or do not break the line just because a new sentence starts.

Driver shall expose those under specific health reporter, two new
reporters are introduced:
- FW health reporter shall represent global events (problems with the
image, recovery mode);
- Port health reporter shall represent port-specific events (module
failure).

Firmware only reports problems when those are detected, it does not store
active fault list.
Driver will hold only last global and last port-specific event.
Driver will report all events via devlink health report,
so in case of multiple events of the same source they can be reviewed
using devlink autodump feature.

$ devlink health

pci/0000:b1:00.3:
   reporter fw
     state healthy error 0 recover 0 auto_dump true
   reporter port
     state error error 1 recover 0 last_dump_date 2024-03-17
    last_dump_time 09:29:29 auto_dump true

$ devlink health diagnose pci/0000:b1:00.3 reporter port

   Syndrome: 262
   Description: Module is not present.
   Possible Solution: Check that the module is inserted correctly.
   Port Number: 0

Tested on Intel Corporation Ethernet Controller E810-C for SFP

Thank you for adding the above information.

Co-developed-by: Sharon Haroni <[email protected]>
Signed-off-by: Sharon Haroni <[email protected]>
Co-developed-by: Nicholas Nunley <[email protected]>
Signed-off-by: Nicholas Nunley <[email protected]>
Co-developed-by: Brett Creeley <[email protected]>
Signed-off-by: Brett Creeley <[email protected]>
Signed-off-by: Konrad Knitter <[email protected]>
---
v2:
- Removal of __VA_OPS__ usage. Style fixes.
Depends-on: https://lore.kernel.org/netdev/20240930133724.610512-1- [email protected]/T/
---
  .../net/ethernet/intel/ice/devlink/health.c   | 253 +++++++++++++++++-
  .../net/ethernet/intel/ice/devlink/health.h   |  14 +-
  .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  87 ++++++
  drivers/net/ethernet/intel/ice/ice_common.c   |  38 +++
  drivers/net/ethernet/intel/ice/ice_common.h   |   2 +
  drivers/net/ethernet/intel/ice/ice_main.c     |   3 +
  drivers/net/ethernet/intel/ice/ice_type.h     |   5 +
  7 files changed, 400 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/devlink/health.c 
b/drivers/net/ethernet/intel/ice/devlink/health.c
index c7a8b8c9e1ca..c5a16879c916 100644
--- a/drivers/net/ethernet/intel/ice/devlink/health.c
+++ b/drivers/net/ethernet/intel/ice/devlink/health.c
@@ -1,13 +1,251 @@
  // SPDX-License-Identifier: GPL-2.0
  /* Copyright (c) 2024, Intel Corporation. */
-#include "health.h"
  #include "ice.h"
+#include "ice_adminq_cmd.h" /* for enum ice_aqc_health_status_elem */
+#include "health.h"
  #include "ice_ethtool_common.h"
  #define ICE_DEVLINK_FMSG_PUT_FIELD(fmsg, obj, name) \
      devlink_fmsg_put(fmsg, #name, (obj)->name)
+#define ICE_HEALTH_STATUS_DATA_SIZE 2
+
+struct ice_health_status {
+    enum ice_aqc_health_status code;
+    const char *description;
+    const char *solution;
+    const char *data_label[ICE_HEALTH_STATUS_DATA_SIZE];
+};
+
+/*
+ * In addition to the health status codes provided below, the firmware might
+ * generate Health Status Codes that are not pertinent to the end-user.
+ * For instance, Health Code 0x1002 is triggered when the command fails.
+ * Such codes should be disregarded by the end-user.
+ * The below lookup requires to be sorted by code.
+ */
+
+static const char *const ice_common_port_solutions =
+    "Check your cable connection. Change or replace the module or cable. Manually 
set speed and duplex.";
+static const char *const ice_port_number_label = "Port Number";
+static const char *const ice_update_nvm_solution = "Update to the latest NVM 
image.";
+
+static const struct ice_health_status ice_health_status_lookup[] = {
+    {ICE_AQC_HEALTH_STATUS_ERR_UNKNOWN_MOD_STRICT, "An unsupported module was 
detected",
+        ice_common_port_solutions, {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_MOD_TYPE, "Module type is not supported.",
+        "Change or replace the module or cable.", {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_MOD_QUAL, "Module is not qualified.",
+        ice_common_port_solutions, {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_MOD_COMM,
+        "Device cannot communicate with the module.",
+        "Check your cable connection. Change or replace the module or cable. 
Manually set speed and duplex.",
+        {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_MOD_CONFLICT, "Unresolved module conflict.",
+        "Manually set speed/duplex or change the port option. If the problem 
persists, use a cable/module that is found in the supported modules and cables list for 
this device.",
+        {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_MOD_NOT_PRESENT, "Module is not present.",
+        "Check that the module is inserted correctly. If the problem  persists, use 
a cable/module that is found in the supported modules and cables list for this 
device.",
+        {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_INFO_MOD_UNDERUTILIZED, "Underutilized module.",
+        "Change or replace the module or cable. Change the port option",
+        {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_UNKNOWN_MOD_LENIENT, "An unsupported module was 
detected",
+        ice_common_port_solutions, {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_INVALID_LINK_CFG, "Invalid link configuration.",
+        NULL, {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_PORT_ACCESS, "Port hardware access error.",

Sometimes there are dots/periods at the end, and sometimes there are none. It’d be great if it were consistent.

+        ice_update_nvm_solution, {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_PORT_UNREACHABLE, "A port is unreachable.",
+        "Change the port option. Update to the latest NVM image."},
+    {ICE_AQC_HEALTH_STATUS_INFO_PORT_SPEED_MOD_LIMITED, "Port speed is limited due 
to module.",
+        "Change the module or configure the port option to match the current module 
speed. Change the port option.",
+        {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_PARALLEL_FAULT,
+        "All configured link modes were attempted but failed to establish link. The 
device will restart the process to establish link.",
+        "Check link partner connection and configuration.",
+        {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_INFO_PORT_SPEED_PHY_LIMITED,
+        "Port speed is limited by PHY capabilities.",
+        "Change the module to align to port option.", {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_NETLIST_TOPO, "LOM topology netlist is 
corrupted.",
+        ice_update_nvm_solution, {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_NETLIST, "Unrecoverable netlist error.",
+        ice_update_nvm_solution, {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_TOPO_CONFLICT, "Port topology conflict.",
+        "Change the port option. Update to the latest NVM image."},
+    {ICE_AQC_HEALTH_STATUS_ERR_LINK_HW_ACCESS, "Unrecoverable hardware access 
error.",
+        ice_update_nvm_solution, {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_LINK_RUNTIME, "Unrecoverable runtime error.",
+        ice_update_nvm_solution, {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_DNL_INIT, "Link management engine failed to 
initialize.",
+        ice_update_nvm_solution, {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_ERR_PHY_FW_LOAD,
+        "Failed to load the firmware image in the external PHY.",
+        ice_update_nvm_solution, {ice_port_number_label}},
+    {ICE_AQC_HEALTH_STATUS_INFO_RECOVERY, "The device is in firmware recovery 
mode.",
+        ice_update_nvm_solution, {"Extended Error"}},
+    {ICE_AQC_HEALTH_STATUS_ERR_FLASH_ACCESS, "The flash chip cannot be 
accessed.",
+        "If issue persists, call customer support.", {"Access Type"}},
+    {ICE_AQC_HEALTH_STATUS_ERR_NVM_AUTH, "NVM authentication failed.",
+        ice_update_nvm_solution},
+    {ICE_AQC_HEALTH_STATUS_ERR_OROM_AUTH, "Option ROM authentication failed",
+        ice_update_nvm_solution},
+    {ICE_AQC_HEALTH_STATUS_ERR_DDP_AUTH, "DDP package authentication failed.",
+        "Update to latest base driver and DDP package."},
+    {ICE_AQC_HEALTH_STATUS_ERR_NVM_COMPAT, "NVM image is incompatible.",
+        ice_update_nvm_solution},
+    {ICE_AQC_HEALTH_STATUS_ERR_OROM_COMPAT, "Option ROM is incompatible.",
+        ice_update_nvm_solution, {"Expected PCI Device ID", "Expected Module 
ID"}},
+    {ICE_AQC_HEALTH_STATUS_ERR_DCB_MIB,
+        "Supplied MIB file is invalid. DCB reverted to default configuration.",
+        "Disable FW-LLDP and check DCBx system configuration.",
+        {ice_port_number_label, "MIB ID"}},
+};
+
+static int ice_health_status_lookup_compare(const void *a, const void *b)
+{
+    return ((struct ice_health_status *)a)->code - ((struct ice_health_status 
*)b)->code;
+}
+
+static const struct ice_health_status *ice_get_health_status(u16 code)
+{
+    struct ice_health_status key = { .code = code };
+
+    return bsearch(&key, ice_health_status_lookup, 
ARRAY_SIZE(ice_health_status_lookup),
+               sizeof(struct ice_health_status), 
ice_health_status_lookup_compare);
+}
+
+static void ice_describe_status_code(struct devlink_fmsg *fmsg,
+                     struct ice_aqc_health_status_elem *hse)
+{
+    static const char *const aux_label[] = { "Aux Data 1", "Aux Data 2" };
+    const struct ice_health_status *health_code;
+    u32 internal_data[2];
+    u16 status_code;
+
+    status_code = le16_to_cpu(hse->health_status_code);
+
+    devlink_fmsg_put(fmsg, "Syndrome", status_code);
+    if (status_code) {
+        internal_data[0] = le32_to_cpu(hse->internal_data1);
+        internal_data[1] = le32_to_cpu(hse->internal_data2);
+
+        health_code = ice_get_health_status(status_code);
+        if (!health_code)
+            return;
+
+        devlink_fmsg_string_pair_put(fmsg, "Description", 
health_code->description);
+        if (health_code->solution)
+            devlink_fmsg_string_pair_put(fmsg, "Possible Solution",
+                             health_code->solution);
+
+        for (int i = 0; i < ICE_HEALTH_STATUS_DATA_SIZE; i++) {

Use size_t?

+            if (internal_data[i] != ICE_AQC_HEALTH_STATUS_UNDEFINED_DATA)
+                devlink_fmsg_u32_pair_put(fmsg,
+                              health_code->data_label[i] ?
+                              health_code->data_label[i] :
+                              aux_label[i],
+                              internal_data[i]);
+        }
+    }
+}
+
+static int
+ice_port_reporter_dump(struct devlink_health_reporter *reporter, struct 
devlink_fmsg *fmsg,
+               void *priv_ctx, struct netlink_ext_ack __always_unused *extack)
+{
+    struct ice_pf *pf = devlink_health_reporter_priv(reporter);
+
+    ice_describe_status_code(fmsg, &pf->health_reporters.port_status);
+    return 0;
+}
+
+static int
+ice_fw_reporter_dump(struct devlink_health_reporter *reporter, struct 
devlink_fmsg *fmsg,
+             void *priv_ctx, struct netlink_ext_ack *extack)
+{
+    struct ice_pf *pf = devlink_health_reporter_priv(reporter);
+
+    ice_describe_status_code(fmsg, &pf->health_reporters.fw_status);
+    return 0;
+}
+
+static void ice_config_health_events(struct ice_pf *pf, bool enable)
+{
+    u8 enable_bits = 0;
+    int ret;
+
+    if (enable)
+        enable_bits = ICE_AQC_HEALTH_STATUS_SET_PF_SPECIFIC_MASK |
+                  ICE_AQC_HEALTH_STATUS_SET_GLOBAL_MASK;
+
+    ret = ice_aq_set_health_status_cfg(&pf->hw, enable_bits);
+    if (ret)
+        dev_err(ice_pf_to_dev(pf), "Failed to %s firmware health events, err %d 
aq_err %s\n",
+            str_enable_disable(enable), ret,
+            ice_aq_str(pf->hw.adminq.sq_last_status));
+}
+
+/**
+ * ice_process_health_status_event - Process the health status event from FW
+ * @pf: pointer to the PF structure
+ * @event: event structure containing the Health Status Event opcode
+ *
+ * Decode the Health Status Events and print the associated messages
+ */
+void ice_process_health_status_event(struct ice_pf *pf, struct 
ice_rq_event_info *event)
+{
+    const struct ice_aqc_health_status_elem *health_info;
+    u16 count;

Why fix the length?

+
+    health_info = (struct ice_aqc_health_status_elem *)event->msg_buf;
+    count = 
le16_to_cpu(event->desc.params.get_health_status.health_status_count);
+
+    if (count > (event->buf_len / sizeof(*health_info))) {
+        dev_err(ice_pf_to_dev(pf), "Received a health status event with invalid 
element count\n");
+        return;
+    }
+
+    for (int i = 0; i < count; i++) {
+        const struct ice_health_status *health_code;
+        u16 status_code;
+
+        status_code = le16_to_cpu(health_info->health_status_code);
+        health_code = ice_get_health_status(status_code);
+
+        if (health_code) {
+            switch (health_info->event_source) {
+            case ICE_AQC_HEALTH_STATUS_GLOBAL:
+                pf->health_reporters.fw_status = *health_info;
+                devlink_health_report(pf->health_reporters.fw,
+                              "FW syndrome reported", NULL);
+                break;
+            case ICE_AQC_HEALTH_STATUS_PF:
+            case ICE_AQC_HEALTH_STATUS_PORT:
+                pf->health_reporters.port_status = *health_info;
+                devlink_health_report(pf->health_reporters.port,
+                              "Port syndrome reported", NULL);
+                break;
+            default:
+                dev_err(ice_pf_to_dev(pf), "Health code with unknown 
source\n");
+            }
+        } else {
+            u32 data1, data2;
+            u16 source;
+
+            source = le16_to_cpu(health_info->event_source);
+            data1 = le32_to_cpu(health_info->internal_data1);
+            data2 = le32_to_cpu(health_info->internal_data2);
+            dev_dbg(ice_pf_to_dev(pf),
+                "Received internal health status code 0x%08x, source: 0x%08x, 
data1: 0x%08x, data2: 0x%08x",
+                status_code, source, data1, data2);
+        }
+        health_info++;
+    }
+}
+
  /**
   * ice_devlink_health_report - boilerplate to call given @reporter
   *
@@ -244,6 +482,8 @@ ice_init_devlink_rep(struct ice_pf *pf,
  ICE_DEFINE_HEALTH_REPORTER_OPS(mdd);
  ICE_DEFINE_HEALTH_REPORTER_OPS(tx_hang);
+ICE_DEFINE_HEALTH_REPORTER_OPS(fw);
+ICE_DEFINE_HEALTH_REPORTER_OPS(port);
  /**
   * ice_health_init - allocate and init all ice devlink health reporters and
@@ -257,6 +497,12 @@ void ice_health_init(struct ice_pf *pf)
      reps->mdd = ice_init_devlink_rep(pf, &ice_mdd_reporter_ops);
      reps->tx_hang = ice_init_devlink_rep(pf, &ice_tx_hang_reporter_ops);
+
+    if (ice_is_fw_health_report_supported(&pf->hw)) {
+        reps->fw = ice_init_devlink_rep(pf, &ice_fw_reporter_ops);
+        reps->port = ice_init_devlink_rep(pf, &ice_port_reporter_ops);
+        ice_config_health_events(pf, true);
+    }
  }
  /**
@@ -279,6 +525,11 @@ void ice_health_deinit(struct ice_pf *pf)
  {
      ice_deinit_devl_reporter(pf->health_reporters.mdd);
      ice_deinit_devl_reporter(pf->health_reporters.tx_hang);
+    if (ice_is_fw_health_report_supported(&pf->hw)) {
+        ice_deinit_devl_reporter(pf->health_reporters.fw);
+        ice_deinit_devl_reporter(pf->health_reporters.port);
+        ice_config_health_events(pf, false);
+    }
  }
  static
diff --git a/drivers/net/ethernet/intel/ice/devlink/health.h 
b/drivers/net/ethernet/intel/ice/devlink/health.h
index a08c7bd174cf..280c429feec8 100644
--- a/drivers/net/ethernet/intel/ice/devlink/health.h
+++ b/drivers/net/ethernet/intel/ice/devlink/health.h
@@ -13,8 +13,10 @@
   * devlink health mechanism for ice driver.
   */
+struct ice_aqc_health_status_elem;
  struct ice_pf;
  struct ice_tx_ring;
+struct ice_rq_event_info;
  enum ice_mdd_src {
      ICE_MDD_SRC_TX_PQM,
@@ -25,17 +27,23 @@ enum ice_mdd_src {
  /**
   * struct ice_health - stores ice devlink health reporters and accompanied data
- * @tx_hang: devlink health reporter for tx_hang event
+ * @fw: devlink health reporter for FW Health Status events
   * @mdd: devlink health reporter for MDD detection event
+ * @port: devlink health reporter for Port Health Status events
+ * @tx_hang: devlink health reporter for tx_hang event
   * @tx_hang_buf: pre-allocated place to put info for Tx hang reporter from
   *               non-sleeping context
   * @tx_ring: ring that the hang occured on
   * @head: descriptior head
   * @intr: interrupt register value
   * @vsi_num: VSI owning the queue that the hang occured on
+ * @fw_status: buffer for last received FW Status event
+ * @port_status: buffer for last received Port Status event
   */
  struct ice_health {
+    struct devlink_health_reporter *fw;
      struct devlink_health_reporter *mdd;
+    struct devlink_health_reporter *port;
      struct devlink_health_reporter *tx_hang;
      struct_group_tagged(ice_health_tx_hang_buf, tx_hang_buf,
          struct ice_tx_ring *tx_ring;
@@ -43,8 +51,12 @@ struct ice_health {
          u32 intr;
          u16 vsi_num;
      );
+    struct ice_aqc_health_status_elem fw_status;
+    struct ice_aqc_health_status_elem port_status;
  };
+void ice_process_health_status_event(struct ice_pf *pf,
+                     struct ice_rq_event_info *event);
  void ice_health_init(struct ice_pf *pf);
  void ice_health_deinit(struct ice_pf *pf);
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h 
b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index ce590991de38..232a1facf397 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -2511,6 +2511,87 @@ enum ice_aqc_fw_logging_mod {
      ICE_AQC_FW_LOG_ID_MAX,
  };
+enum ice_aqc_health_status_mask {
+    ICE_AQC_HEALTH_STATUS_SET_PF_SPECIFIC_MASK = BIT(0),
+    ICE_AQC_HEALTH_STATUS_SET_ALL_PF_MASK      = BIT(1),
+    ICE_AQC_HEALTH_STATUS_SET_GLOBAL_MASK      = BIT(2),
+};
+
+/* Set Health Status (direct 0xFF20) */
+struct ice_aqc_set_health_status_cfg {
+    u8 event_source;
+    u8 reserved[15];
+};
+
+enum ice_aqc_health_status {
+    ICE_AQC_HEALTH_STATUS_ERR_UNKNOWN_MOD_STRICT        = 0x101,
+    ICE_AQC_HEALTH_STATUS_ERR_MOD_TYPE            = 0x102,
+    ICE_AQC_HEALTH_STATUS_ERR_MOD_QUAL            = 0x103,
+    ICE_AQC_HEALTH_STATUS_ERR_MOD_COMM            = 0x104,
+    ICE_AQC_HEALTH_STATUS_ERR_MOD_CONFLICT            = 0x105,
+    ICE_AQC_HEALTH_STATUS_ERR_MOD_NOT_PRESENT        = 0x106,
+    ICE_AQC_HEALTH_STATUS_INFO_MOD_UNDERUTILIZED        = 0x107,
+    ICE_AQC_HEALTH_STATUS_ERR_UNKNOWN_MOD_LENIENT        = 0x108,
+    ICE_AQC_HEALTH_STATUS_ERR_MOD_DIAGNOSTIC_FEATURE    = 0x109,
+    ICE_AQC_HEALTH_STATUS_ERR_INVALID_LINK_CFG        = 0x10B,
+    ICE_AQC_HEALTH_STATUS_ERR_PORT_ACCESS            = 0x10C,
+    ICE_AQC_HEALTH_STATUS_ERR_PORT_UNREACHABLE        = 0x10D,
+    ICE_AQC_HEALTH_STATUS_INFO_PORT_SPEED_MOD_LIMITED    = 0x10F,
+    ICE_AQC_HEALTH_STATUS_ERR_PARALLEL_FAULT        = 0x110,
+    ICE_AQC_HEALTH_STATUS_INFO_PORT_SPEED_PHY_LIMITED    = 0x111,
+    ICE_AQC_HEALTH_STATUS_ERR_NETLIST_TOPO            = 0x112,
+    ICE_AQC_HEALTH_STATUS_ERR_NETLIST            = 0x113,
+    ICE_AQC_HEALTH_STATUS_ERR_TOPO_CONFLICT            = 0x114,
+    ICE_AQC_HEALTH_STATUS_ERR_LINK_HW_ACCESS        = 0x115,
+    ICE_AQC_HEALTH_STATUS_ERR_LINK_RUNTIME            = 0x116,
+    ICE_AQC_HEALTH_STATUS_ERR_DNL_INIT            = 0x117,
+    ICE_AQC_HEALTH_STATUS_ERR_PHY_NVM_PROG            = 0x120,
+    ICE_AQC_HEALTH_STATUS_ERR_PHY_FW_LOAD            = 0x121,
+    ICE_AQC_HEALTH_STATUS_INFO_RECOVERY            = 0x500,
+    ICE_AQC_HEALTH_STATUS_ERR_FLASH_ACCESS            = 0x501,
+    ICE_AQC_HEALTH_STATUS_ERR_NVM_AUTH            = 0x502,
+    ICE_AQC_HEALTH_STATUS_ERR_OROM_AUTH            = 0x503,
+    ICE_AQC_HEALTH_STATUS_ERR_DDP_AUTH            = 0x504,
+    ICE_AQC_HEALTH_STATUS_ERR_NVM_COMPAT            = 0x505,
+    ICE_AQC_HEALTH_STATUS_ERR_OROM_COMPAT            = 0x506,
+    ICE_AQC_HEALTH_STATUS_ERR_NVM_SEC_VIOLATION        = 0x507,
+    ICE_AQC_HEALTH_STATUS_ERR_OROM_SEC_VIOLATION        = 0x508,
+    ICE_AQC_HEALTH_STATUS_ERR_DCB_MIB            = 0x509,
+    ICE_AQC_HEALTH_STATUS_ERR_MNG_TIMEOUT            = 0x50A,
+    ICE_AQC_HEALTH_STATUS_ERR_BMC_RESET            = 0x50B,
+    ICE_AQC_HEALTH_STATUS_ERR_LAST_MNG_FAIL            = 0x50C,
+    ICE_AQC_HEALTH_STATUS_ERR_RESOURCE_ALLOC_FAIL        = 0x50D,
+    ICE_AQC_HEALTH_STATUS_ERR_FW_LOOP            = 0x1000,
+    ICE_AQC_HEALTH_STATUS_ERR_FW_PFR_FAIL            = 0x1001,
+    ICE_AQC_HEALTH_STATUS_ERR_LAST_FAIL_AQ            = 0x1002,
+};
+
+/* Get Health Status (indirect 0xFF22) */
+struct ice_aqc_get_health_status {
+    __le16 health_status_count;
+    u8 reserved[6];
+    __le32 addr_high;
+    __le32 addr_low;
+};
+
+enum ice_aqc_health_status_scope {
+    ICE_AQC_HEALTH_STATUS_PF    = 0x1,
+    ICE_AQC_HEALTH_STATUS_PORT    = 0x2,
+    ICE_AQC_HEALTH_STATUS_GLOBAL    = 0x3,
+};
+
+#define ICE_AQC_HEALTH_STATUS_UNDEFINED_DATA    0xDEADBEEF
+
+/* Get Health Status event buffer entry (0xFF22),
+ * repeated per reported health status.
+ */
+struct ice_aqc_health_status_elem {
+    __le16 health_status_code;
+    __le16 event_source;
+    __le32 internal_data1;
+    __le32 internal_data2;
+};
+
  /* Set FW Logging configuration (indirect 0xFF30)
   * Register for FW Logging (indirect 0xFF31)
   * Query FW Logging (indirect 0xFF32)
@@ -2651,6 +2732,8 @@ struct ice_aq_desc {
          struct ice_aqc_get_link_status get_link_status;
          struct ice_aqc_event_lan_overflow lan_overflow;
          struct ice_aqc_get_link_topo get_link_topo;
+        struct ice_aqc_set_health_status_cfg set_health_status_cfg;
+        struct ice_aqc_get_health_status get_health_status;
          struct ice_aqc_dnl_call_command dnl_call;
          struct ice_aqc_i2c read_write_i2c;
          struct ice_aqc_read_i2c_resp read_i2c_resp;
@@ -2853,6 +2936,10 @@ enum ice_adminq_opc {
      /* Standalone Commands/Events */
      ice_aqc_opc_event_lan_overflow            = 0x1001,
+    /* SystemDiagnostic commands */

Add a space before Diagnostic?

+    ice_aqc_opc_set_health_status_cfg        = 0xFF20,
+    ice_aqc_opc_get_health_status            = 0xFF22,
+
      /* FW Logging Commands */
      ice_aqc_opc_fw_logs_config            = 0xFF30,
      ice_aqc_opc_fw_logs_register            = 0xFF31,
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c 
b/drivers/net/ethernet/intel/ice/ice_common.c
index faba09b9d880..9c61318d3027 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -6047,6 +6047,44 @@ bool ice_is_phy_caps_an_enabled(struct ice_aqc_get_phy_caps_data *caps)
      return false;
  }
+/**
+ * ice_is_fw_health_report_supported
+ * @hw: pointer to the hardware structure
+ *
+ * Return: true if firmware supports health status reports,
+ * false otherwise
+ */
+bool ice_is_fw_health_report_supported(struct ice_hw *hw)
+{
+    return ice_is_fw_api_min_ver(hw, ICE_FW_API_HEALTH_REPORT_MAJ,
+                     ICE_FW_API_HEALTH_REPORT_MIN,
+                     ICE_FW_API_HEALTH_REPORT_PATCH);
+}
+
+/**
+ * ice_aq_set_health_status_cfg - Configure FW health events
+ * @hw: pointer to the HW struct
+ * @event_source: type of diagnostic events to enable
+ *
+ * Configure the health status event types that the firmware will send to this
+ * PF. The supported event types are: PF-specific, all PFs, and global.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+int ice_aq_set_health_status_cfg(struct ice_hw *hw, u8 event_source)
+{
+    struct ice_aqc_set_health_status_cfg *cmd;
+    struct ice_aq_desc desc;
+
+    cmd = &desc.params.set_health_status_cfg;
+
+    ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_set_health_status_cfg);
+
+    cmd->event_source = event_source;
+
+    return ice_aq_send_cmd(hw, &desc, NULL, 0, NULL);
+}
+
  /**
   * ice_aq_set_lldp_mib - Set the LLDP MIB
   * @hw: pointer to the HW struct
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h 
b/drivers/net/ethernet/intel/ice/ice_common.h
index 52a1b72cce26..e132851dc0f0 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -141,6 +141,8 @@ int
  ice_get_link_default_override(struct ice_link_default_override_tlv *ldo,
                    struct ice_port_info *pi);
  bool ice_is_phy_caps_an_enabled(struct ice_aqc_get_phy_caps_data *caps);
+bool ice_is_fw_health_report_supported(struct ice_hw *hw);
+int ice_aq_set_health_status_cfg(struct ice_hw *hw, u8 event_source);
  int ice_aq_get_phy_equalization(struct ice_hw *hw, u16 data_in, u16 op_code,
                  u8 serdes_num, int *output);
  int
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 7b9be612cf33..36cfbe771d1b 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1567,6 +1567,9 @@ static int __ice_clean_ctrlq(struct ice_pf *pf, enum ice_ctl_q q_type)
          case ice_aqc_opc_lldp_set_mib_change:
              ice_dcb_process_lldp_set_mib_change(pf, &event);
              break;
+        case ice_aqc_opc_get_health_status:
+            ice_process_health_status_event(pf, &event);
+            break;
          default:
              dev_dbg(dev, "%s Receive Queue unknown event 0x%04x ignored\n",
                  qtype, opcode);
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h 
b/drivers/net/ethernet/intel/ice/ice_type.h
index e2e6b2119889..42ac5a9f1cf4 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -1207,4 +1207,9 @@ struct ice_aq_get_set_rss_lut_params {
  #define ICE_FW_API_REPORT_DFLT_CFG_MIN        7
  #define ICE_FW_API_REPORT_DFLT_CFG_PATCH    3
+/* AQ API version for Health Status support */
+#define ICE_FW_API_HEALTH_REPORT_MAJ        1
+#define ICE_FW_API_HEALTH_REPORT_MIN        7
+#define ICE_FW_API_HEALTH_REPORT_PATCH        6
+
  #endif /* _ICE_TYPE_H_ */


Kind regards,

Paul

Reply via email to