Thu, Sep 27, 2018 at 04:02:48PM CEST, era...@mellanox.com wrote: > > >On 9/27/2018 3:47 PM, Jiri Pirko wrote: >> Wed, Sep 26, 2018 at 01:52:58PM CEST, era...@mellanox.com wrote: >> > The exception spec is targeted for Real Time Alerting, in order to know >> > when >> > something bad had happened to a PCI device >> > - Provide alert debug information >> > - Self healing >> > - If problem needs vendor support, provide a way to gather all needed >> > debugging >> > information. >> > >> > The exception mechanism contains condition checkers which sense for >> > malfunction. Upon a condition hit, >> > actions such as logs and correction can be taken. >> > >> > The condition checkers are divided into the following groups >> > - Hardware - a checker which is triggered by the device due to >> > malfunction. >> > - Software - a checker which is triggered by the software due to >> > malfunction. >> >> What do you mean by a "software malfunction", a "FW malfunction"? >> Also, I don't see this 2 groups in the man. > >Software malfunction can be a Transmit error (caused by bad send request).
Sorry, but I still don't undestand what "software malfuntion" are you talking about. Could you be more specific please? >FW/HW malfunction can be any catastrophic error report (the ones that should >be exposed to driver). >The comment here was to highlight that we can support different kinds of >condition groups. >If for a specific condition, we will need to highlight it is SW/HW, we can >concatenate it to its name. > >Eran > >> >> >> > Both groups of condition checkers can be triggered due to error event or >> > due to a periodic check. >> > >> > Actions are the way to handle those events. Action can be in one of the >> > following groups: >> > - Dump - SW trace, SW dump, HW trace, HW dump >> > - Reset - Surgical correction (e.g. modify Q, flush Q, reset of device, >> > etc) >> > Actions can be performed by SW or HW. >> > >> > User is allowed to enable or disable condition checkers and its action >> > mapping. >> > >> > This RFC man page patch describes the suggested API of devlink-exception >> > in order >> > to control conditions and actions. >> > >> > V2: >> > * Renaming terms: >> > health -> exception >> > sensor -> condition >> > * Remove reinit command and merge with action command. >> > * Consmetics in grammer. >> > >> > Eran Ben Elisha (1): >> > man: Add devlink exception man page >> > >> > man/man8/devlink-exception.8 | 158 >> > +++++++++++++++++++++++++++++++++++++++++++ >> > 1 file changed, 158 insertions(+) >> > create mode 100644 man/man8/devlink-exception.8 >> > >> > -- >> > 1.8.3.1 >> >