I actually submitted a patch to linux-ha-dev as described
On Jun 4, 2010, at 7:04 AM, Dejan Muhamedagic wrote:
> Hi,
>
> On Thu, Jun 03, 2010 at 06:52:09PM -0400, Vadym Chepkov wrote:
>> Hi
>>
>> There is a bug in stonith/plugins/external/rackpdu in cluster-glue-1.0.5
>>
>> It doesn't check if snmpset was successful or not :
>>
>> SendCommand() {
>>
>> local host=$1
>> local command=$2
>>
>> GetOutletNumber $host
>> local outlet=$?
>>
>> if [ $outlet -gt 0 ]; then
>> local set_result=`snmpset -v1 -c $community $pduip $oid.$outlet i
>> $command 2>&1`
>> local check_result=`echo "$set_result" | grep "Timeout"`
>>
>> if [ ! -z "$check_result" ]; then
>> ha_log.sh err "Write SNMP value $oid.$outlet=$command. Result:
>> $set_result"
>> fi
>>
>> return 0
>> else
>> return 1
>> fi
>> }
>>
>> Here is what happens:
>>
>> + '[' 1 -gt 0 ']'
>> ++ snmpset -v1 -c private 10.10.10.10 .1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.1 i
>> 2
>> + local 'set_result=Error in packet.
>> Reason: (genError) A general failure occured'
>> ++ echo 'Error in packet.
>> Reason: (genError) A general failure occured'
>> ++ grep Timeout
>> + local check_result=
>> + '[' '!' -z '' ']'
>> + return 0
>> + exit 0
>>
>> so stonith agent says it was successful when it was not :(
>>
>> instead of grepping for "Timeout" (why?)
>
> Don't know. Yes, that's strange. I left that check in anyway. Can
> you simulate a time out and see what does snmpset return (exit
> code)?
>
>> it should check if exit status was 0, then it was successful
>> 2 - failed and not recoverable
>
> Yes, fixed now. Also snmpwalk for gethosts.
>
>> 1 - you can possibly retry.
>>
>> The last one, unfortunately, usually happens when somebody is
>> already logged in into PDU (via http or telnet)
>
> Well, we could retry, but that's probably going to be in vain.
> That needs to be documented.
>
> Can you please test the changes. You can pull the new version
> from the repository for testing.
>
> Many thanks for the report.
>
> Dejan
I actually submitted a patch to linux-ha-dev list as described on clusterlabs
site, I guess it never got it there.
I attach it now. I assume the original author didn't realize
local result=`command`
always returns 0, no matter what command outcome was. timeout does generate
exit code 1
--- Begin Message ---
# HG changeset patch
# User Vadym Chepkov <[email protected]>
# Date 1275609966 14400
# Node ID 955b957b9e64c83cff9a0e793922143f573cc712
# Parent 5385c0d6c83668cd970161b2862282570b3cf92a
Check exit codes of snmp utils
diff -r 5385c0d6c836 -r 955b957b9e64 lib/plugins/stonith/external/rackpdu
--- a/lib/plugins/stonith/external/rackpdu Tue May 25 15:35:38 2010 +0200
+++ b/lib/plugins/stonith/external/rackpdu Thu Jun 03 20:06:06 2010 -0400
@@ -68,7 +68,12 @@
# Get outlet number from device
local outlet_num=1
- local snmp_result=`snmpwalk -v1 -c $community $pduip $names_oid 2>&1`
+ local snmp_result
+ snmp_result=`snmpwalk -v1 -c $community $pduip $names_oid 2>&1`
+ if [ $? -ne 0 ]; then
+ ha_log.sh err "Outlet number not found for node $nodename. Result:
$snmp_result"
+ return 0
+ fi
local names=`echo "$snmp_result" | cut -f2 -d'"' | tr ' ' '_' | tr
'\012' ' '`
@@ -95,11 +100,11 @@
local outlet=$?
if [ $outlet -gt 0 ]; then
- local set_result=`snmpset -v1 -c $community $pduip $oid.$outlet i
$command 2>&1`
- local check_result=`echo "$set_result" | grep "Timeout"`
-
- if [ ! -z "$check_result" ]; then
+ local set_result
+ set_result=`snmpset -v1 -c $community $pduip $oid.$outlet i $command
2>&1`
+ if [ $? -ne 0 ]; then
ha_log.sh err "Write SNMP value $oid.$outlet=$command. Result:
$set_result"
+ return 1
fi
return 0
@@ -116,9 +121,7 @@
gethosts)
if [ "$hostlist" = "AUTO" ]; then
snmp_result=`snmpwalk -v1 -c $community $pduip $names_oid 2>&1`
- snmp_check=`echo "$snmp_result" | grep "Timeout"`
-
- if [ ! -z "$snmp_check" ]; then
+ if [ $? -ne 0 ]; then
ha_log.sh err "Cannot read list of nodes from device. Result:
$snmp_result"
exit 1
else
--- End Message ---
Vadym
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems