On Fri, Jun 04, 2010 at 07:14:12AM -0400, Vadym Chepkov wrote:
> I actually submitted a patch to linux-ha-dev as described 

Either I missed it, or it never got there. Perhaps you're not
subscribed to the list? At any rate, the patch is already in the
repository.

Cheers,

Dejan

> On Jun 4, 2010, at 7:04 AM, Dejan Muhamedagic wrote:
> 
> > Hi,
> > 
> > On Thu, Jun 03, 2010 at 06:52:09PM -0400, Vadym Chepkov wrote:
> >> Hi
> >> 
> >> There is a bug in stonith/plugins/external/rackpdu in cluster-glue-1.0.5
> >> 
> >> It doesn't check if snmpset was successful or not :
> >> 
> >> SendCommand() {
> >> 
> >>    local host=$1
> >>    local command=$2
> >> 
> >>    GetOutletNumber $host
> >>    local outlet=$?
> >> 
> >>    if [ $outlet -gt 0 ]; then
> >>        local set_result=`snmpset -v1 -c $community $pduip $oid.$outlet i 
> >> $command 2>&1`
> >>        local check_result=`echo "$set_result" | grep "Timeout"`            
> >> 
> >>        if [ ! -z "$check_result" ]; then
> >>            ha_log.sh err "Write SNMP value $oid.$outlet=$command. Result: 
> >> $set_result"
> >>        fi
> >> 
> >>        return 0
> >>    else
> >>        return 1
> >>    fi
> >> }
> >> 
> >> Here is what happens:
> >> 
> >> + '[' 1 -gt 0 ']'
> >> ++ snmpset -v1 -c private 10.10.10.10  .1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.1 
> >> i 2
> >> + local 'set_result=Error in packet.
> >> Reason: (genError) A general failure occured'
> >> ++ echo 'Error in packet.
> >> Reason: (genError) A general failure occured'
> >> ++ grep Timeout
> >> + local check_result=
> >> + '[' '!' -z '' ']'
> >> + return 0
> >> + exit 0
> >> 
> >> so stonith agent says it was successful when it was not :(
> >> 
> >> instead of grepping for "Timeout" (why?)
> > 
> > Don't know. Yes, that's strange. I left that check in anyway. Can
> > you simulate a time out and see what does snmpset return (exit
> > code)?
> > 
> >> it should check if exit status was 0, then it was successful
> >> 2 - failed and not recoverable 
> > 
> > Yes, fixed now. Also snmpwalk for gethosts.
> > 
> >> 1 - you can possibly retry. 
> >> 
> >> The last one, unfortunately, usually happens when somebody is
> >> already logged in into PDU (via http or telnet)
> > 
> > Well, we could retry, but that's probably going to be in vain.
> > That needs to be documented.
> > 
> > Can you please test the changes. You can pull the new version
> > from the repository for testing.
> > 
> > Many thanks for the report.
> > 
> > Dejan
> 
> 
> I actually submitted a patch to linux-ha-dev list as described on clusterlabs 
> site, I guess it never got it there. 
> I attach it now. I assume the original author didn't realize 
> 
> local result=`command` 
> 
> always returns 0, no matter what command outcome was. timeout does generate 
> exit code 1
> 
> 

Delivered-To: [email protected]
Received: by 10.150.211.9 with SMTP id j9cs68199ybg;
        Fri, 4 Jun 2010 04:09:14 -0700 (PDT)
Received: by 10.150.65.10 with SMTP id n10mr10901081yba.9.1275649754581;
        Fri, 04 Jun 2010 04:09:14 -0700 (PDT)
Return-Path: <[email protected]>
Received: from vms173011.mailsrvcs.net (vms173011pub.verizon.net 
[206.46.173.11])
        by mx.google.com with ESMTP id v23si5591487ybv.60.2010.06.04.04.09.14;
        Fri, 04 Jun 2010 04:09:14 -0700 (PDT)
Received-SPF: neutral (google.com: 206.46.173.11 is neither permitted nor 
denied by domain of [email protected]) client-ip=206.46.173.11;
Authentication-Results: mx.google.com; spf=neutral (google.com: 206.46.173.11 
is neither permitted nor denied by domain of [email protected]) 
[email protected]
Received: from fedora.chepkov.lan ([unknown] [173.71.210.176])
 by vms173011.mailsrvcs.net
 (Sun Java(tm) System Messaging Server 7u2-7.02 32bit (built Apr 16 2009))
 with ESMTPA id <[email protected]> for
 [email protected]; Fri, 04 Jun 2010 06:09:14 -0500 (CDT)
Received: from centos64-dev.chepkov.lan
 (centos64-dev.chepkov.lan [10.10.10.92])       by fedora.chepkov.lan 
(8.14.4/8.14.4)
 with ESMTP id o54B9BUL023880; Fri, 04 Jun 2010 07:09:11 -0400
Content-type: text/plain; charset=us-ascii
MIME-version: 1.0
Content-transfer-encoding: 7bit
Subject: [PATCH] Check exit codes of snmp utils
X-Mercurial-Node: 955b957b9e64c83cff9a0e793922143f573cc712
Message-id: <[email protected]>
User-Agent: Mercurial-patchbomb/1.5.1
Date: Fri, 04 Jun 2010 07:09:12 -0400
From: Vadym Chepkov <[email protected]>
To: [email protected]
> 
> # HG changeset patch
> # User Vadym Chepkov <[email protected]>
> # Date 1275609966 14400
> # Node ID 955b957b9e64c83cff9a0e793922143f573cc712
> # Parent  5385c0d6c83668cd970161b2862282570b3cf92a
> Check exit codes of snmp utils
> 
> diff -r 5385c0d6c836 -r 955b957b9e64 lib/plugins/stonith/external/rackpdu
> --- a/lib/plugins/stonith/external/rackpdu    Tue May 25 15:35:38 2010 +0200
> +++ b/lib/plugins/stonith/external/rackpdu    Thu Jun 03 20:06:06 2010 -0400
> @@ -68,7 +68,12 @@
>       # Get outlet number from device
>      
>       local outlet_num=1
> -     local snmp_result=`snmpwalk -v1 -c $community $pduip $names_oid 2>&1`
> +     local snmp_result
> +     snmp_result=`snmpwalk -v1 -c $community $pduip $names_oid 2>&1`
> +        if [ $? -ne 0 ]; then
> +         ha_log.sh err "Outlet number not found for node $nodename. Result: 
> $snmp_result"
> +         return 0
> +     fi
>  
>       local names=`echo "$snmp_result" | cut -f2 -d'"' | tr ' ' '_' | tr 
> '\012' ' '`
>  
> @@ -95,11 +100,11 @@
>      local outlet=$?
>  
>      if [ $outlet -gt 0 ]; then
> -        local set_result=`snmpset -v1 -c $community $pduip $oid.$outlet i 
> $command 2>&1`
> -        local check_result=`echo "$set_result" | grep "Timeout"`         
> -
> -        if [ ! -z "$check_result" ]; then
> +        local set_result
> +        set_result=`snmpset -v1 -c $community $pduip $oid.$outlet i $command 
> 2>&1`
> +        if [ $? -ne 0 ]; then
>           ha_log.sh err "Write SNMP value $oid.$outlet=$command. Result: 
> $set_result"
> +            return 1
>       fi
>           
>       return 0
> @@ -116,9 +121,7 @@
>  gethosts)
>       if [ "$hostlist" = "AUTO" ]; then
>           snmp_result=`snmpwalk -v1 -c $community $pduip $names_oid 2>&1`
> -         snmp_check=`echo "$snmp_result" | grep "Timeout"`
> -
> -         if [ ! -z "$snmp_check" ]; then
> +         if [ $? -ne 0 ]; then
>               ha_log.sh err "Cannot read list of nodes from device. Result: 
> $snmp_result"
>               exit 1
>           else

> 
> 
> 
> Vadym
> 
> 

> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to