Yep, I'd recommend having your event handler that fires on an overheat condition correlate _several_ sources before shutting down large numbers of systems. If you look hard, you'll surely find a number of good sources for temp correlation (netbotz, switch/router SNMP, management processors, chiller, cooling towers, lm_sensors, etc).
Having a per-host shutdown based on local lm_sensors/management info is usually fine (just beware of bugs in your temp reference...), i.e. If you're checking CPU temps, check temp and fan status... Large-scale cluster power-off's are tender though, you may even want to avoid having that handled automatically, and just have an easily-accessible method of doing a room/datacenter manually from remote if you do correlate everything. The action of turning something off is the easy part, it's determining that you _really_ want to that's pointy. IMO /eli On 6/8/06 11:34 AM, "Johnston Michael J Contr AFRL/DES" <[EMAIL PROTECTED]> wrote: > > Does anyone use anything that will go out and shutdown computers in > instances where a room is over heating or too many errors start occurring? > We've recently had a problem with heat in a server room. I got messages > that the room was overheating, but by the time I got there the room was > really hot and all the machines were running. I'm looking for something > that takes steps to save machines if a threshold is ever met or exceeded. > > Thanks for the help! > > > _______________________________________________ > Nagios-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > _______________________________________________ Nagios-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
