Dear All, I developed the "NodeHealthMonitor" (NHM), which is hosted at GENIVI. David Yates recently informed me about the ongoing discussion. Therefore, I joined the mailing list. At all I can see a lot of similiarities between the component proposed by Abhishek and the NHM we currently have at GENIVI. The NHM sources are available at:
http://git.projects.genivi.org/?p=lifecycle/node-health-monitor.git Here is a short summary on what the NHM does (copied from nhm-main.c): "The Node Health Monitor will usually be started by systemd and will interact with application plug-ins to inform it that a component has failed in the system. He will be responsible for: - providing an interface with which plug-ins can register failures: - name of the failing service, used to identify and track failures - tracking failure statistics over multiple LCs for system and components - NHM will maintain a count of the number of failures in the current life cycle as well as statistics on number of failures in last X life cycles (i.e. 3 failures in last 32 life cycles) - observe the life cycle accordingly to catch unexpected system restarts - provide an interface for plug-ins to read system/component error counts - provide an interface for plug-ins to request a node restart Additionally the Node Health Monitor will test a number of product defined criteria with the aim to ensure that userland is stable and functional. It will be able to validate that: - defined file is present - defined processes are still running - a user defined process can be executed with an expected result - communication on defined dbus (bus address) is possible" In accordance with this, the NHM should be a good basis for the component proposed by Abhishek, altough I see the following differences: 1. NHM only cares about the systemd property "ActiveState". Additional tracking of the other properties would need to be implemented. 2. NHM stores data using GENIVI PCL. A configure switch might need to be introduced to use the POSIX API. 3. NHM offers "userland checks". They are not harnfuld, since they can be deactivated. => This is no issue at all. In conclusion, I would recommend to collect the Use Cases as suggested by Jeremiah and Igor and invite you all to also check the NHM documentation and code to look for differences and possibilities on how to solve them. Best rgards Jean-Pierre _______________________________________________ IVI mailing list [email protected] https://lists.tizen.org/listinfo/ivi
