Hi list,

I've written an SNMPv2 probe that polls the SP processor on a v20z server (via the Management port) to get some environmental informations like:

- Ambient air temperature;
- CPUs temperature;
- Fan speed;
- RAM banks temperature;
- Gigabit Ethernet card temperature.. and so on.

You can find further infos (OIDS, tech references etc..) in the probe text.

Feedbacks and enhancements are very appreciated, ciao !

<!--
        SNMP - Ifom-ieo-Campus.it (it.ifom-ieo-campus.snmp.sun-v20z)
        Custom Probe for InterMapper 4.5 (http://www.intermapper.com)
        Initial release: 17/11/2006
        Author: Alessandro Dellavedova
        e-mail: alessandro.dellavedovaAT_NOSPAMMERSgmail.com
        I take NO RESPONSIBILITY if your server explodes, catch fire or will be 
abducted by aliens.
        Base OID for SUN SP Processor: 1.3.6.1.4.1.9237
        Useful documents:
        "Sun FireTM V20z and Sun Fire V40z Servers - Server Management Guide"
        "Sun Fire V20z and Sun Fire V40z Servers - Troubleshooting Techniques 
and Diagnostics Guide"
-->

<header>
        "type"                  =       "custom-snmp"
        "flags"                 =       "NOLINKS,SNMPV2C"|
        "package"               =       "it.ifom-ieo-campus"
        "probe_name"    =       "snmp.sun-v20z"
        "human_name"    =       "SNMP - SUN v20z Environmental probe"
        "version"               =       "2.0"
        "address_type"  =       "IP"
        "port_number"   =       "161"
</header>

<description>
\GB\SNMP - SUN v20z Environmental probe\P\

This probe monitors the temperature and fan sensors of a SUN v20z server via 
the SP processor that sits on the Management interface.
The default values have been set both by reading SUN documentation and by 
polling the sensors, eg: 

node9 $ sensor get -i ambienttemp -cwWC
Identifier  Crit Low Warn Low Warn High Crit High
ambienttemp NA       NA          35.00     40.00

The default values works for me but your mileage may vary.

References:

"Sun FireTM V20z and Sun Fire V40z Servers - Server Management Guide"
"Sun Fire V20z and Sun Fire V40z Servers - Troubleshooting Techniques and 
Diagnostics Guide"

\i\Ambient temp. - Critical\p\Êis the \bU1\CRITICAL\0P\ threshold in \xBA C for 
the ambient temp. reading.
\i\Ambient temp. - Alarm\p\Êis the \bU6\ALARM\0P\ threshold in \xBA C for the 
ambient temp. reading.
\i\Ambient temp. - Warning\p\Êis the \bU7\WARNING\0P\ threshold in \xBA C for 
the ambient temp. reading.

</description>

<parameters>
        "Ambient temp. - Critical"                      =       "40"
        "Ambient temp. - Alarm"                         =       "35"
        "Ambient temp. - Warning"                       =       "30"
</parameters>

<snmp-device-variables>
        SPSwname1,      1.3.6.1.4.1.9237.2.1.1.1.3.1.2.1,       Default,        
""
        SPSwname2,      1.3.6.1.4.1.9237.2.1.1.1.3.1.2.2,       Default,        
""
        SPSwname3,      1.3.6.1.4.1.9237.2.1.1.1.3.1.2.3,       Default,        
""
        SPSwrev1,       1.3.6.1.4.1.9237.2.1.1.1.3.1.3.1,       Default,        
""
        SPSwrev2,       1.3.6.1.4.1.9237.2.1.1.1.3.1.3.2,       Default,        
""
        SPSwrev3,       1.3.6.1.4.1.9237.2.1.1.1.3.1.3.3,       Default,        
""
        SPAmbienttemp,  1.3.6.1.4.1.9237.2.1.1.4.1.1.2.1,       Chartable,      
""
        SPcpu0dietemp,  1.3.6.1.4.1.9237.2.1.1.4.1.1.2.9,       Chartable,      
""
        SPcpu1dietemp,  1.3.6.1.4.1.9237.2.1.1.4.1.1.2.21,      Chartable,      
""
        SPcpu0memtemp,  1.3.6.1.4.1.9237.2.1.1.4.1.1.2.15,      Chartable,      
""
        SPcpu1memtemp,  1.3.6.1.4.1.9237.2.1.1.4.1.1.2.27,      Chartable,      
""
        SPgigaethtemp,  1.3.6.1.4.1.9237.2.1.1.4.1.1.2.41,      Chartable,      
""
        SPHDDbptemp,    1.3.6.1.4.1.9237.2.1.1.4.1.1.2.41,      Chartable,      
""
        SPSPtemp,               1.3.6.1.4.1.9237.2.1.1.4.1.1.2.59,      
Chartable,      ""
        SPPSfanfail,    1.3.6.1.4.1.9237.2.1.1.4.1.1.2.55,      Default,        
""
        SPfan1tach,             1.3.6.1.4.1.9237.2.1.1.4.1.1.2.32,      
Chartable,      ""
        SPfan2tach,             1.3.6.1.4.1.9237.2.1.1.4.1.1.2.33,      
Chartable,      ""
        SPfan3tach,             1.3.6.1.4.1.9237.2.1.1.4.1.1.2.34,      
Chartable,      ""
        SPfan4tach,             1.3.6.1.4.1.9237.2.1.1.4.1.1.2.35,      
Chartable,      ""
        SPfan5tach,             1.3.6.1.4.1.9237.2.1.1.4.1.1.2.36,      
Chartable,      ""
        SPfan6tach,             1.3.6.1.4.1.9237.2.1.1.4.1.1.2.37,      
Chartable,      ""

        -- Some temperatures are in tenths of degree (C) we divide by 10 to 
obtain degrees, to obtain Farenheit we multiply by .18
        
        Ambienttemp,    $SPAmbienttemp / 10,    CALCULATION,    "$SPAmbienttemp 
/ 10"
        AmbienttempF,   (0.18 * $SPAmbienttemp) + 32,   CALCULATION,    ""
        
        cpu0dietemp,    $SPcpu0dietemp / 10,    CALCULATION,    "$SPcpu0dietemp 
/ 10"
        cpu0dietempF,   (0.18 * $SPcpu0dietemp) + 32,   CALCULATION,    ""
        cpu1dietemp,    $SPcpu1dietemp / 10,    CALCULATION,    "$SPcpu1dietemp 
/ 10"
        cpu1dietempF,   (0.18 * $SPcpu1dietemp) + 32,   CALCULATION,    ""
        
        cpu0memtemp,    $SPcpu0memtemp / 10,    CALCULATION,    "$SPcpu0memtemp 
/ 10"
        cpu0memtempF,   (0.18 * $SPcpu0memtemp) + 32,   CALCULATION,    ""
        cpu1memtemp,    $SPcpu1memtemp / 10,    CALCULATION,    "$SPcpu1memtemp 
/ 10"
        cpu1memtempF,   (0.18 * $SPcpu1memtemp) + 32,   CALCULATION,    ""
        
        gigaethtemp,    $SPgigaethtemp / 10,    CALCULATION,    "$SPgigaethtemp 
/ 10"
        gigaethtempF,   (0.18 * $SPgigaethtemp) + 32,   CALCULATION,    ""

        HDDbptemp,              $SPHDDbptemp / 10,      CALCULATION,    
"$SPHDDbptemp / 10"
        HDDbptempF,             (0.18 * $SPHDDbptemp) + 32,     CALCULATION,    
""

        SPtemp,                 $SPSPtemp /     10,             CALCULATION,    
"$SPSPtemp / 10"
        SPtempF,                (0.18 * $SPSPtemp) + 32,        CALCULATION,    
""
        
        
</snmp-device-variables>

<snmp-device-thresholds>
        critical:       ${Ambienttemp} >= ${Ambient temp. - Critical}           
        "Ambient temp. is over ${Ambient temp. - Critical} \xBAC - Critical 
temperature EXCEEDED !!"
        alarm:          ${Ambienttemp} > ${Ambient temp. - Alarm}               
                "Ambient temp. is over ${Ambient temp. - Alarm} \xBAC !"
        warning:        ${Ambienttemp} > ${Ambient temp. - Warning}             
                "Ambient temp. is over ${Ambient temp. - Warning} \xBAC"
        critical:       ${cpu0dietemp} >= 73    "CPU 0 temp. is over 73\xBAC - 
CPU is melting down !!"
        alarm:          ${cpu0dietemp} > 70     "CPU 0 temp. is over 70\xBAC !"
        warning:        ${cpu0dietemp} > 67     "CPU 0 temp. is over 67\xBAC"
        critical:       ${cpu1dietemp} >= 73    "CPU 0 temp. is over 73\xBAC - 
CPU is melting down !!"
        alarm:          ${cpu1dietemp} > 70     "CPU 0 temp. is over 70\xBAC !"
        warning:        ${cpu1dietemp} > 67     "CPU 0 temp. is over 67\xBAC"
        critical:       ${cpu0memtemp} >= 60    "RAM bank - CPU 0 temp. is over 
59\xBAC - RAM banks are melting down !!"
        alarm:          ${cpu0memtemp} > 55     "RAM bank - CPU 0 temp. is over 
55\xBAC !"
        warning:        ${cpu0memtemp} > 50     "RAM bank - CPU 0 temp. is over 
50\xBAC"
        critical:       ${cpu1memtemp} >= 60    "RAM bank - CPU 0 temp. is over 
60\xBAC - RAM banks are melting down !!"
        alarm:          ${cpu1memtemp} > 55     "RAM bank - CPU 0 temp. is over 
55\xBAC !"
        warning:        ${cpu1memtemp} > 50     "RAM bank - CPU 0 temp. is over 
50\xBAC"
        critical:       ${gigaethtemp} >= 60    "Gigabit Ethernet temp. is over 
60\xBAC - Packets are frying !!"
        alarm:          ${gigaethtemp} > 55     "Gigabit Ethernet temp. is over 
55\xBAC !"
        warning:        ${gigaethtemp} > 50     "Gigabit Ethernet temp. is over 
50\xBAC"
        critical:       ${HDDbptemp} >= 60 "HDD backplane temp. is over 60\xBAC 
- Hard Disk backplane is melting down !!"
        alarm:          ${HDDbptemp} > 55 "HDD backplane temp. is over 55\xBAC 
!"
        warning:        ${HDDbptemp} > 50 "HDD backplane temp. is over 50\xBAC"
        critical:       ${SPtemp} >= 60 "Service Processor temp. is over 
60\xBAC - Service Processor is melting down !!"
        alarm:          ${SPtemp} > 55 "Service Processor temp. is over 55\xBAC 
!"
        warning:        ${SPtemp} > 50 "Service Processor temp. is over 50\xBAC"
        critical:       ${SPfan1tach}   =< 2000         "Fan 1 is running TOO 
SLOW !"
        critical:       ${SPfan2tach}   =< 2000         "Fan 2 is running TOO 
SLOW !"
        critical:       ${SPfan3tach}   =< 2000         "Fan 3 is running TOO 
SLOW !"
        critical:       ${SPfan4tach}   =< 2000         "Fan 4 is running TOO 
SLOW !"
        critical:       ${SPfan5tach}   =< 2000         "Fan 5 is running TOO 
SLOW !"
        critical:       ${SPfan6tach}   =< 2000         "Fan 6 is running TOO 
SLOW !"
        alarm:          ${SPfan1tach}   >= 14500        "Fan 1 is running TOO 
FAST !"
        alarm:          ${SPfan2tach}   >= 14500        "Fan 2 is running TOO 
FAST !"
        alarm:          ${SPfan3tach}   >= 14500        "Fan 3 is running TOO 
FAST !"
        alarm:          ${SPfan4tach}   >= 14500        "Fan 4 is running TOO 
FAST !"
        alarm:          ${SPfan5tach}   >= 14500        "Fan 5 is running TOO 
FAST !"
        alarm:          ${SPfan6tach}   >= 14500        "Fan 6 is running TOO 
FAST !"
        critical:       ${SPPSfanfail}  = 1 "Power Supply fan FAILURE !!"
        critical:       ${SPPStemp}             = 1 "Power Supply is TOO HOT !!"
</snmp-device-thresholds>

<snmp-device-display>

\B5\SUN v20z BIOS/SP Information\0P\

\4\$SPSwname1 $SPSwrev1\0\
\4\$SPSwname2 $SPSwrev2\0\
\4\$SPSwname3 $SPSwrev3\0\

\B5\SUN v20z Sensors status\0P\

\4\Ambient air temperature\0\:  ${chartable:##.#:$Ambienttemp}\xBAC 
($AmbienttempF\xBAF)

\4\CPU 0\0\:            ${chartable:##.#:$cpu0dietemp}\xBAC ($cpu0dietempF\xBAF)
\4\CPU 1\0\:            ${chartable:##.#:$cpu1dietemp}\xBAC ($cpu1dietempF\xBAF)

\4\Fan 1 speed\0\:      ${chartable:$SPfan1tach}
\4\Fan 2 speed\0\:      ${chartable:$SPfan2tach}
\4\Fan 3 speed\0\:      ${chartable:$SPfan3tach}
\4\Fan 4 speed\0\:      ${chartable:$SPfan4tach}
\4\Fan 5 speed\0\:      ${chartable:$SPfan5tach}
\4\Fan 6 speed\0\:      ${chartable:$SPfan6tach}

\4\RAM bank - CPU 0\0\: ${chartable:##.#:$cpu0memtemp}\xBAC ($cpu0memtempF\xBAF)
\4\RAM bank - CPU 1\0\: ${chartable:##.#:$cpu1memtemp}\xBAC ($cpu1memtempF\xBAF)

\4\Gigabit Ethernet\0\  :       ${chartable:##.#:$gigaethtemp}\xBAC 
($gigaethtempF\xBAF)
\4\Hard Disk Backplane\0\:      ${chartable:##.#:$HDDbptemp}\xBAC 
($HDDbptempF\xBAF)
\4\Service Processor\0\ :       ${chartable:##.#:$SPtemp}\xBAC ($SPtempF\xBAF)
</snmp-device-display>

Alessandro Dellavedova

IT Manager
European Institute of Oncology  - Department of  Experimental Oncology
Via Ripamonti, 435 - 20141 Milano (Italy)
[EMAIL PROTECTED]


Reply via email to