Re: [Linux-HA] The active trap of the SNMP is delayed.

Lars Ellenberg Thu, 21 Jul 2011 17:27:16 -0700

On Tue, Jul 19, 2011 at 11:04:51AM +0900, [email protected] wrote:
> Hi All,
> 
> We are troubled in the face of this problem.
> Please give advice.
> 
> * This problem changed the destination of the mailing list to seem to be a 
> problem of the HA.
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> 
> --- On Fri, 2011/6/17, [email protected] 
> <[email protected]> wrote:
> 
> > Hi All,
> > 
> > I registered this problem in Bugzilla.
> > 
> >  * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2604
> > 
> > Best Regards,
> > Hideo Yamauch.
> > 
> > --- On Wed, 2011/6/15, [email protected] 
> > <[email protected]> wrote:
> > 
> > > Hi All,
> > > 
> > > I found a problem with a trap of the SNMP.(from hbagent.)
> > >
> > > A trap of active of the node seems to have possibilities to be delayed.
> > > 
> > > In addition, this problem sometimes occurs and does not always occur.
> > > 
> > > 
> > > I confirmed it in the next procedure.
> > > 
> > > Step1) Start a node.
> > > 
> > > ============
> > > Last updated: Wed Jun 15 19:23:39 2011
> > > Stack: Heartbeat
> > > Current DC: srv02 (afe72fff-b7b4-4663-b845-872df29c635d) - partition 
> > > WITHOUT quorum
> > > Version: 1.0.11-6e010d6b0d49a6b929d17c0114e9d2d934dc8e04
> > > 2 Nodes configured, unknown expected votes
> > > 1 Resources configured.
> > > ============
> > > 
> > > Online: [ srv01 srv02 ]
> > > 
> > >  Resource Group: group-1
> > >      prmDummy1  (ocf::heartbeat:Dummy): Started srv01
> > > 
> > > Migration summary:
> > > * Node srv02: 
> > > * Node srv01: 
> > > 
> > > 
> > > Step2) Intercept one interface of the Heartbeat communication.
> > > 
> > > # iptables -A INPUT -i eth1 -s ! 192.168.10.110 -j DROP
> > > # iptables -A INPUT -i eth1 -s ! 192.168.10.120 -j DROP
> > > 
> > > 
> > > Step3) The next trap is received in SNMP managers.
> > > 
> > > (snip)
> > > Jun 15 19:24:30 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:30 
> > > <UNKNOWN> [UDP: [192.168.40.120]:59010]: 
> > > DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23014) 0:03:50.14       
> > > SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHAIFStatusUpdate        
> > > LINUX-HA-MIB::LHANodeName = STRING: srv01       LINUX-HA-MIB::LHAIFName = 
> > > STRING: eth1       LINUX-HA-MIB::LHAIFStatus = INTEGER: down(2) 
> > >    ----> No problem.
> > > Jun 15 19:24:32 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:32 
> > > <UNKNOWN> [UDP: [192.168.40.110]:44001]: 
> > > DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23597) 0:03:55.97       
> > > SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHANodeStatusUpdate      
> > > LINUX-HA-MIB::LHANodeName = STRING: srv02       
> > > LINUX-HA-MIB::LHANodeStatus = INTEGER: active(3)
> > >    ----> The trap of active is improper in this timing.


Why?

> > > Jun 15 19:24:34 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:34 
> > > <UNKNOWN> [UDP: [192.168.40.110]:44001]: 
> > > DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23803) 0:03:58.03       
> > > SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHAIFStatusUpdate        
> > > LINUX-HA-MIB::LHANodeName = STRING: srv02       LINUX-HA-MIB::LHAIFName = 
> > > STRING: eth1       LINUX-HA-MIB::LHAIFStatus = INTEGER: down(2) 
> > >    ----> No problem.
> > > (snip)
> > > 
> > > Between the traps which interface intercepted, it is strange that the 
> > > active trap of the node comes.
> > > 
> > > And I think that it is necessary for the active trap to be sent in an 
> > > earlier timing.
> > > 
> > > 
> > > This problem seems to happen in Heartbeat2.1.4.
> > > 
> > > I watched some sources, but think that client_lib of Heartbeat has a 
> > > problem somehow or other.
> > > Transmitted F_STATUS message is late and seems to be handled.

hbagent is no longer in the heartbeat code.
According to mercurial, it was removed three years ago.
I doubt it is/was used by many.
So I fear you won't get much help for this.


Still, I don't see "the problem".
You have two communication channels configured.
You block one.
You get a *link* down trap, immediately, probably because sending fails
locally if you do iptables -j DROP.

> > > Jun 15 19:24:30 snmp-manager snmptrapd[4771]: LHAIFStatusUpdate 
> > > LHANodeName srv01 LHAIFName eth1 LHAIFStatus down(2)


You get a *node* active.
Why do you think this is wrong?
Which timing would have been "proper", and why?

> > > Jun 15 19:24:32 snmp-manager snmptrapd[4771]: 
> > > LHANodeStatusUpdate LHANodeName srv02 LHANodeStatus active(3)


And after timeout, you get the *link* down to the other node.

> > > Jun 15 19:24:34 snmp-manager snmptrapd[4771]: LHAIFStatusUpdate 
> > > LHANodeName srv02 LHAIFName eth1 LHAIFStatus down(2) 


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] The active trap of the SNMP is delayed.

Reply via email to