Dirk Roessler wrote: > Does someone knows an easy to install and easy to use solution for > monitoring and sending email notifications of down nodes and health > state on a Linux HPC cluster? You could use Nagios and Ganglia Python client. Basically you use the Ganglia Python client to get metric value then depending on its value you send an alert.
Setting up Nagios may not be easy but it is definitely worth it long term. Vladimir

