Dirk Roessler wrote:
> Does someone knows an easy to install and easy to use solution for
> monitoring and sending email notifications of down nodes and health
> state on a Linux HPC cluster?
You could use Nagios and Ganglia Python client. Basically you use the
Ganglia Python client to get metric value then depending on its value
you send an alert.

Setting up Nagios may not be easy but it is definitely worth it long term.

Vladimir

Reply via email to