Dear All,

I developed the "NodeHealthMonitor" (NHM), which is hosted at GENIVI.
David Yates recently
informed me about the ongoing discussion. Therefore, I joined the
mailing list. At all I can see
a lot of similiarities between the component proposed by Abhishek and
the NHM we currently
have at GENIVI. The NHM sources are available at:

http://git.projects.genivi.org/?p=lifecycle/node-health-monitor.git

Here is a short summary on what the NHM does (copied from nhm-main.c):

"The Node Health Monitor will usually be started by systemd and will interact
with application plug-ins to inform it that a component has failed in the
system. He will be responsible for:

  - providing an interface with which plug-ins can register failures:
    - name of the failing service, used to identify and track failures
    - tracking failure statistics over multiple LCs for system and components
  - NHM will maintain a count of the number of failures in the current
life cycle
    as well as statistics on number of failures in last X life cycles (i.e. 3
    failures in last 32 life cycles)
  - observe the life cycle accordingly to catch unexpected system restarts
  - provide an interface for plug-ins to read system/component error counts
  - provide an interface for plug-ins to request a node restart

Additionally the Node Health Monitor will test a number of product defined
criteria with the aim to ensure that userland is stable and functional.
It will be able to validate that:
  - defined file is present
  - defined processes are still running
  - a user defined process can be executed with an expected result
  - communication on defined dbus (bus address) is possible"

In accordance with this, the NHM should be a good basis for the component
proposed by Abhishek, altough I see the following differences:

1. NHM only cares about the systemd property "ActiveState".
    Additional tracking of the other properties would need to be
    implemented.
2. NHM stores data using GENIVI PCL. A configure switch
    might need to be introduced to use the POSIX API.
3. NHM offers "userland checks". They are not harnfuld, since they can
    be deactivated. => This is no issue at all.

In conclusion, I would recommend to collect the Use Cases as suggested by
Jeremiah and Igor and invite you all to also check the NHM documentation and
code to look for differences and possibilities on how to solve them.

Best rgards
Jean-Pierre
_______________________________________________
IVI mailing list
[email protected]
https://lists.tizen.org/listinfo/ivi

Reply via email to