I actually submitted a patch to linux-ha-dev as described 
On Jun 4, 2010, at 7:04 AM, Dejan Muhamedagic wrote:

> Hi,
> 
> On Thu, Jun 03, 2010 at 06:52:09PM -0400, Vadym Chepkov wrote:
>> Hi
>> 
>> There is a bug in stonith/plugins/external/rackpdu in cluster-glue-1.0.5
>> 
>> It doesn't check if snmpset was successful or not :
>> 
>> SendCommand() {
>> 
>>    local host=$1
>>    local command=$2
>> 
>>    GetOutletNumber $host
>>    local outlet=$?
>> 
>>    if [ $outlet -gt 0 ]; then
>>        local set_result=`snmpset -v1 -c $community $pduip $oid.$outlet i 
>> $command 2>&1`
>>        local check_result=`echo "$set_result" | grep "Timeout"`            
>> 
>>        if [ ! -z "$check_result" ]; then
>>            ha_log.sh err "Write SNMP value $oid.$outlet=$command. Result: 
>> $set_result"
>>        fi
>> 
>>        return 0
>>    else
>>        return 1
>>    fi
>> }
>> 
>> Here is what happens:
>> 
>> + '[' 1 -gt 0 ']'
>> ++ snmpset -v1 -c private 10.10.10.10  .1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.1 i 
>> 2
>> + local 'set_result=Error in packet.
>> Reason: (genError) A general failure occured'
>> ++ echo 'Error in packet.
>> Reason: (genError) A general failure occured'
>> ++ grep Timeout
>> + local check_result=
>> + '[' '!' -z '' ']'
>> + return 0
>> + exit 0
>> 
>> so stonith agent says it was successful when it was not :(
>> 
>> instead of grepping for "Timeout" (why?)
> 
> Don't know. Yes, that's strange. I left that check in anyway. Can
> you simulate a time out and see what does snmpset return (exit
> code)?
> 
>> it should check if exit status was 0, then it was successful
>> 2 - failed and not recoverable 
> 
> Yes, fixed now. Also snmpwalk for gethosts.
> 
>> 1 - you can possibly retry. 
>> 
>> The last one, unfortunately, usually happens when somebody is
>> already logged in into PDU (via http or telnet)
> 
> Well, we could retry, but that's probably going to be in vain.
> That needs to be documented.
> 
> Can you please test the changes. You can pull the new version
> from the repository for testing.
> 
> Many thanks for the report.
> 
> Dejan


I actually submitted a patch to linux-ha-dev list as described on clusterlabs 
site, I guess it never got it there. 
I attach it now. I assume the original author didn't realize 

local result=`command` 

always returns 0, no matter what command outcome was. timeout does generate 
exit code 1


--- Begin Message ---
# HG changeset patch
# User Vadym Chepkov <[email protected]>
# Date 1275609966 14400
# Node ID 955b957b9e64c83cff9a0e793922143f573cc712
# Parent  5385c0d6c83668cd970161b2862282570b3cf92a
Check exit codes of snmp utils

diff -r 5385c0d6c836 -r 955b957b9e64 lib/plugins/stonith/external/rackpdu
--- a/lib/plugins/stonith/external/rackpdu      Tue May 25 15:35:38 2010 +0200
+++ b/lib/plugins/stonith/external/rackpdu      Thu Jun 03 20:06:06 2010 -0400
@@ -68,7 +68,12 @@
        # Get outlet number from device
     
        local outlet_num=1
-       local snmp_result=`snmpwalk -v1 -c $community $pduip $names_oid 2>&1`
+       local snmp_result
+       snmp_result=`snmpwalk -v1 -c $community $pduip $names_oid 2>&1`
+        if [ $? -ne 0 ]; then
+           ha_log.sh err "Outlet number not found for node $nodename. Result: 
$snmp_result"
+           return 0
+       fi
 
        local names=`echo "$snmp_result" | cut -f2 -d'"' | tr ' ' '_' | tr 
'\012' ' '`
 
@@ -95,11 +100,11 @@
     local outlet=$?
 
     if [ $outlet -gt 0 ]; then
-        local set_result=`snmpset -v1 -c $community $pduip $oid.$outlet i 
$command 2>&1`
-        local check_result=`echo "$set_result" | grep "Timeout"`           
-
-        if [ ! -z "$check_result" ]; then
+        local set_result
+        set_result=`snmpset -v1 -c $community $pduip $oid.$outlet i $command 
2>&1`
+        if [ $? -ne 0 ]; then
            ha_log.sh err "Write SNMP value $oid.$outlet=$command. Result: 
$set_result"
+            return 1
        fi
            
        return 0
@@ -116,9 +121,7 @@
 gethosts)
        if [ "$hostlist" = "AUTO" ]; then
            snmp_result=`snmpwalk -v1 -c $community $pduip $names_oid 2>&1`
-           snmp_check=`echo "$snmp_result" | grep "Timeout"`
-
-           if [ ! -z "$snmp_check" ]; then
+           if [ $? -ne 0 ]; then
                ha_log.sh err "Cannot read list of nodes from device. Result: 
$snmp_result"
                exit 1
            else

--- End Message ---


Vadym


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to