Hi,
I run ganglia on 180 nodes cluster, it works well. I am monitoring the cpu
temperature by running a script on each node which updates regularely the
TempCPU metrics. I have another script on the server which execute "ganglia
TempCPU" I take the return value to verify if the value exceed a limit and if
it is the case shut down the node with home made controler module. This works
great.
The problem is if I want to restart the node, his old TempCPU value stays
in ganglia memory. So the server script shutdowns the node again because the
TempCPU value still exceed the limit.
Is there a way for the server script to alter the TempCpu value of the
shutdowned node to put a tag value ( exemple : -1) ?
Thanks
Karl