Hi list,
I've written an SNMPv2 probe that polls the SP processor on a v20z
server (via the Management port) to get some environmental
informations like:
- Ambient air temperature;
- CPUs temperature;
- Fan speed;
- RAM banks temperature;
- Gigabit Ethernet card temperature.. and so on.
You can find further infos (OIDS, tech references etc..) in the probe
text.
Feedbacks and enhancements are very appreciated, ciao !
<!--
SNMP - Ifom-ieo-Campus.it (it.ifom-ieo-campus.snmp.sun-v20z)
Custom Probe for InterMapper 4.5 (http://www.intermapper.com)
Initial release: 17/11/2006
Author: Alessandro Dellavedova
e-mail: alessandro.dellavedovaAT_NOSPAMMERSgmail.com
I take NO RESPONSIBILITY if your server explodes, catch fire or will be
abducted by aliens.
Base OID for SUN SP Processor: 1.3.6.1.4.1.9237
Useful documents:
"Sun FireTM V20z and Sun Fire V40z Servers - Server Management Guide"
"Sun Fire V20z and Sun Fire V40z Servers - Troubleshooting Techniques
and Diagnostics Guide"
-->
<header>
"type" = "custom-snmp"
"flags" = "NOLINKS,SNMPV2C"|
"package" = "it.ifom-ieo-campus"
"probe_name" = "snmp.sun-v20z"
"human_name" = "SNMP - SUN v20z Environmental probe"
"version" = "2.0"
"address_type" = "IP"
"port_number" = "161"
</header>
<description>
\GB\SNMP - SUN v20z Environmental probe\P\
This probe monitors the temperature and fan sensors of a SUN v20z server via
the SP processor that sits on the Management interface.
The default values have been set both by reading SUN documentation and by
polling the sensors, eg:
node9 $ sensor get -i ambienttemp -cwWC
Identifier Crit Low Warn Low Warn High Crit High
ambienttemp NA NA 35.00 40.00
The default values works for me but your mileage may vary.
References:
"Sun FireTM V20z and Sun Fire V40z Servers - Server Management Guide"
"Sun Fire V20z and Sun Fire V40z Servers - Troubleshooting Techniques and
Diagnostics Guide"
\i\Ambient temp. - Critical\p\Êis the \bU1\CRITICAL\0P\ threshold in \xBA C for
the ambient temp. reading.
\i\Ambient temp. - Alarm\p\Êis the \bU6\ALARM\0P\ threshold in \xBA C for the
ambient temp. reading.
\i\Ambient temp. - Warning\p\Êis the \bU7\WARNING\0P\ threshold in \xBA C for
the ambient temp. reading.
</description>
<parameters>
"Ambient temp. - Critical" = "40"
"Ambient temp. - Alarm" = "35"
"Ambient temp. - Warning" = "30"
</parameters>
<snmp-device-variables>
SPSwname1, 1.3.6.1.4.1.9237.2.1.1.1.3.1.2.1, Default,
""
SPSwname2, 1.3.6.1.4.1.9237.2.1.1.1.3.1.2.2, Default,
""
SPSwname3, 1.3.6.1.4.1.9237.2.1.1.1.3.1.2.3, Default,
""
SPSwrev1, 1.3.6.1.4.1.9237.2.1.1.1.3.1.3.1, Default,
""
SPSwrev2, 1.3.6.1.4.1.9237.2.1.1.1.3.1.3.2, Default,
""
SPSwrev3, 1.3.6.1.4.1.9237.2.1.1.1.3.1.3.3, Default,
""
SPAmbienttemp, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.1, Chartable,
""
SPcpu0dietemp, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.9, Chartable,
""
SPcpu1dietemp, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.21, Chartable,
""
SPcpu0memtemp, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.15, Chartable,
""
SPcpu1memtemp, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.27, Chartable,
""
SPgigaethtemp, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.41, Chartable,
""
SPHDDbptemp, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.41, Chartable,
""
SPSPtemp, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.59,
Chartable, ""
SPPSfanfail, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.55, Default,
""
SPfan1tach, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.32,
Chartable, ""
SPfan2tach, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.33,
Chartable, ""
SPfan3tach, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.34,
Chartable, ""
SPfan4tach, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.35,
Chartable, ""
SPfan5tach, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.36,
Chartable, ""
SPfan6tach, 1.3.6.1.4.1.9237.2.1.1.4.1.1.2.37,
Chartable, ""
-- Some temperatures are in tenths of degree (C) we divide by 10 to
obtain degrees, to obtain Farenheit we multiply by .18
Ambienttemp, $SPAmbienttemp / 10, CALCULATION, "$SPAmbienttemp
/ 10"
AmbienttempF, (0.18 * $SPAmbienttemp) + 32, CALCULATION, ""
cpu0dietemp, $SPcpu0dietemp / 10, CALCULATION, "$SPcpu0dietemp
/ 10"
cpu0dietempF, (0.18 * $SPcpu0dietemp) + 32, CALCULATION, ""
cpu1dietemp, $SPcpu1dietemp / 10, CALCULATION, "$SPcpu1dietemp
/ 10"
cpu1dietempF, (0.18 * $SPcpu1dietemp) + 32, CALCULATION, ""
cpu0memtemp, $SPcpu0memtemp / 10, CALCULATION, "$SPcpu0memtemp
/ 10"
cpu0memtempF, (0.18 * $SPcpu0memtemp) + 32, CALCULATION, ""
cpu1memtemp, $SPcpu1memtemp / 10, CALCULATION, "$SPcpu1memtemp
/ 10"
cpu1memtempF, (0.18 * $SPcpu1memtemp) + 32, CALCULATION, ""
gigaethtemp, $SPgigaethtemp / 10, CALCULATION, "$SPgigaethtemp
/ 10"
gigaethtempF, (0.18 * $SPgigaethtemp) + 32, CALCULATION, ""
HDDbptemp, $SPHDDbptemp / 10, CALCULATION,
"$SPHDDbptemp / 10"
HDDbptempF, (0.18 * $SPHDDbptemp) + 32, CALCULATION,
""
SPtemp, $SPSPtemp / 10, CALCULATION,
"$SPSPtemp / 10"
SPtempF, (0.18 * $SPSPtemp) + 32, CALCULATION,
""
</snmp-device-variables>
<snmp-device-thresholds>
critical: ${Ambienttemp} >= ${Ambient temp. - Critical}
"Ambient temp. is over ${Ambient temp. - Critical} \xBAC - Critical
temperature EXCEEDED !!"
alarm: ${Ambienttemp} > ${Ambient temp. - Alarm}
"Ambient temp. is over ${Ambient temp. - Alarm} \xBAC !"
warning: ${Ambienttemp} > ${Ambient temp. - Warning}
"Ambient temp. is over ${Ambient temp. - Warning} \xBAC"
critical: ${cpu0dietemp} >= 73 "CPU 0 temp. is over 73\xBAC -
CPU is melting down !!"
alarm: ${cpu0dietemp} > 70 "CPU 0 temp. is over 70\xBAC !"
warning: ${cpu0dietemp} > 67 "CPU 0 temp. is over 67\xBAC"
critical: ${cpu1dietemp} >= 73 "CPU 0 temp. is over 73\xBAC -
CPU is melting down !!"
alarm: ${cpu1dietemp} > 70 "CPU 0 temp. is over 70\xBAC !"
warning: ${cpu1dietemp} > 67 "CPU 0 temp. is over 67\xBAC"
critical: ${cpu0memtemp} >= 60 "RAM bank - CPU 0 temp. is over
59\xBAC - RAM banks are melting down !!"
alarm: ${cpu0memtemp} > 55 "RAM bank - CPU 0 temp. is over
55\xBAC !"
warning: ${cpu0memtemp} > 50 "RAM bank - CPU 0 temp. is over
50\xBAC"
critical: ${cpu1memtemp} >= 60 "RAM bank - CPU 0 temp. is over
60\xBAC - RAM banks are melting down !!"
alarm: ${cpu1memtemp} > 55 "RAM bank - CPU 0 temp. is over
55\xBAC !"
warning: ${cpu1memtemp} > 50 "RAM bank - CPU 0 temp. is over
50\xBAC"
critical: ${gigaethtemp} >= 60 "Gigabit Ethernet temp. is over
60\xBAC - Packets are frying !!"
alarm: ${gigaethtemp} > 55 "Gigabit Ethernet temp. is over
55\xBAC !"
warning: ${gigaethtemp} > 50 "Gigabit Ethernet temp. is over
50\xBAC"
critical: ${HDDbptemp} >= 60 "HDD backplane temp. is over 60\xBAC
- Hard Disk backplane is melting down !!"
alarm: ${HDDbptemp} > 55 "HDD backplane temp. is over 55\xBAC
!"
warning: ${HDDbptemp} > 50 "HDD backplane temp. is over 50\xBAC"
critical: ${SPtemp} >= 60 "Service Processor temp. is over
60\xBAC - Service Processor is melting down !!"
alarm: ${SPtemp} > 55 "Service Processor temp. is over 55\xBAC
!"
warning: ${SPtemp} > 50 "Service Processor temp. is over 50\xBAC"
critical: ${SPfan1tach} =< 2000 "Fan 1 is running TOO
SLOW !"
critical: ${SPfan2tach} =< 2000 "Fan 2 is running TOO
SLOW !"
critical: ${SPfan3tach} =< 2000 "Fan 3 is running TOO
SLOW !"
critical: ${SPfan4tach} =< 2000 "Fan 4 is running TOO
SLOW !"
critical: ${SPfan5tach} =< 2000 "Fan 5 is running TOO
SLOW !"
critical: ${SPfan6tach} =< 2000 "Fan 6 is running TOO
SLOW !"
alarm: ${SPfan1tach} >= 14500 "Fan 1 is running TOO
FAST !"
alarm: ${SPfan2tach} >= 14500 "Fan 2 is running TOO
FAST !"
alarm: ${SPfan3tach} >= 14500 "Fan 3 is running TOO
FAST !"
alarm: ${SPfan4tach} >= 14500 "Fan 4 is running TOO
FAST !"
alarm: ${SPfan5tach} >= 14500 "Fan 5 is running TOO
FAST !"
alarm: ${SPfan6tach} >= 14500 "Fan 6 is running TOO
FAST !"
critical: ${SPPSfanfail} = 1 "Power Supply fan FAILURE !!"
critical: ${SPPStemp} = 1 "Power Supply is TOO HOT !!"
</snmp-device-thresholds>
<snmp-device-display>
\B5\SUN v20z BIOS/SP Information\0P\
\4\$SPSwname1 $SPSwrev1\0\
\4\$SPSwname2 $SPSwrev2\0\
\4\$SPSwname3 $SPSwrev3\0\
\B5\SUN v20z Sensors status\0P\
\4\Ambient air temperature\0\: ${chartable:##.#:$Ambienttemp}\xBAC
($AmbienttempF\xBAF)
\4\CPU 0\0\: ${chartable:##.#:$cpu0dietemp}\xBAC ($cpu0dietempF\xBAF)
\4\CPU 1\0\: ${chartable:##.#:$cpu1dietemp}\xBAC ($cpu1dietempF\xBAF)
\4\Fan 1 speed\0\: ${chartable:$SPfan1tach}
\4\Fan 2 speed\0\: ${chartable:$SPfan2tach}
\4\Fan 3 speed\0\: ${chartable:$SPfan3tach}
\4\Fan 4 speed\0\: ${chartable:$SPfan4tach}
\4\Fan 5 speed\0\: ${chartable:$SPfan5tach}
\4\Fan 6 speed\0\: ${chartable:$SPfan6tach}
\4\RAM bank - CPU 0\0\: ${chartable:##.#:$cpu0memtemp}\xBAC ($cpu0memtempF\xBAF)
\4\RAM bank - CPU 1\0\: ${chartable:##.#:$cpu1memtemp}\xBAC ($cpu1memtempF\xBAF)
\4\Gigabit Ethernet\0\ : ${chartable:##.#:$gigaethtemp}\xBAC
($gigaethtempF\xBAF)
\4\Hard Disk Backplane\0\: ${chartable:##.#:$HDDbptemp}\xBAC
($HDDbptempF\xBAF)
\4\Service Processor\0\ : ${chartable:##.#:$SPtemp}\xBAC ($SPtempF\xBAF)
</snmp-device-display>
Alessandro Dellavedova
IT Manager
European Institute of Oncology - Department of Experimental Oncology
Via Ripamonti, 435 - 20141 Milano (Italy)
[EMAIL PROTECTED]