On Tue, Jul 19, 2011 at 11:04:51AM +0900, [email protected] wrote: > Hi All, > > We are troubled in the face of this problem. > Please give advice. > > * This problem changed the destination of the mailing list to seem to be a > problem of the HA. > > Best Regards, > Hideo Yamauchi. > > > > --- On Fri, 2011/6/17, [email protected] > <[email protected]> wrote: > > > Hi All, > > > > I registered this problem in Bugzilla. > > > > * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2604 > > > > Best Regards, > > Hideo Yamauch. > > > > --- On Wed, 2011/6/15, [email protected] > > <[email protected]> wrote: > > > > > Hi All, > > > > > > I found a problem with a trap of the SNMP.(from hbagent.) > > > > > > A trap of active of the node seems to have possibilities to be delayed. > > > > > > In addition, this problem sometimes occurs and does not always occur. > > > > > > > > > I confirmed it in the next procedure. > > > > > > Step1) Start a node. > > > > > > ============ > > > Last updated: Wed Jun 15 19:23:39 2011 > > > Stack: Heartbeat > > > Current DC: srv02 (afe72fff-b7b4-4663-b845-872df29c635d) - partition > > > WITHOUT quorum > > > Version: 1.0.11-6e010d6b0d49a6b929d17c0114e9d2d934dc8e04 > > > 2 Nodes configured, unknown expected votes > > > 1 Resources configured. > > > ============ > > > > > > Online: [ srv01 srv02 ] > > > > > > Resource Group: group-1 > > > prmDummy1 (ocf::heartbeat:Dummy): Started srv01 > > > > > > Migration summary: > > > * Node srv02: > > > * Node srv01: > > > > > > > > > Step2) Intercept one interface of the Heartbeat communication. > > > > > > # iptables -A INPUT -i eth1 -s ! 192.168.10.110 -j DROP > > > # iptables -A INPUT -i eth1 -s ! 192.168.10.120 -j DROP > > > > > > > > > Step3) The next trap is received in SNMP managers. > > > > > > (snip) > > > Jun 15 19:24:30 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:30 > > > <UNKNOWN> [UDP: [192.168.40.120]:59010]: > > > DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23014) 0:03:50.14 > > > SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHAIFStatusUpdate > > > LINUX-HA-MIB::LHANodeName = STRING: srv01 LINUX-HA-MIB::LHAIFName = > > > STRING: eth1 LINUX-HA-MIB::LHAIFStatus = INTEGER: down(2) > > > ----> No problem. > > > Jun 15 19:24:32 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:32 > > > <UNKNOWN> [UDP: [192.168.40.110]:44001]: > > > DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23597) 0:03:55.97 > > > SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHANodeStatusUpdate > > > LINUX-HA-MIB::LHANodeName = STRING: srv02 > > > LINUX-HA-MIB::LHANodeStatus = INTEGER: active(3) > > > ----> The trap of active is improper in this timing.
Why? > > > Jun 15 19:24:34 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:34 > > > <UNKNOWN> [UDP: [192.168.40.110]:44001]: > > > DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23803) 0:03:58.03 > > > SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHAIFStatusUpdate > > > LINUX-HA-MIB::LHANodeName = STRING: srv02 LINUX-HA-MIB::LHAIFName = > > > STRING: eth1 LINUX-HA-MIB::LHAIFStatus = INTEGER: down(2) > > > ----> No problem. > > > (snip) > > > > > > Between the traps which interface intercepted, it is strange that the > > > active trap of the node comes. > > > > > > And I think that it is necessary for the active trap to be sent in an > > > earlier timing. > > > > > > > > > This problem seems to happen in Heartbeat2.1.4. > > > > > > I watched some sources, but think that client_lib of Heartbeat has a > > > problem somehow or other. > > > Transmitted F_STATUS message is late and seems to be handled. hbagent is no longer in the heartbeat code. According to mercurial, it was removed three years ago. I doubt it is/was used by many. So I fear you won't get much help for this. Still, I don't see "the problem". You have two communication channels configured. You block one. You get a *link* down trap, immediately, probably because sending fails locally if you do iptables -j DROP. > > > Jun 15 19:24:30 snmp-manager snmptrapd[4771]: LHAIFStatusUpdate > > > LHANodeName srv01 LHAIFName eth1 LHAIFStatus down(2) You get a *node* active. Why do you think this is wrong? Which timing would have been "proper", and why? > > > Jun 15 19:24:32 snmp-manager snmptrapd[4771]: > > > LHANodeStatusUpdate LHANodeName srv02 LHANodeStatus active(3) And after timeout, you get the *link* down to the other node. > > > Jun 15 19:24:34 snmp-manager snmptrapd[4771]: LHAIFStatusUpdate > > > LHANodeName srv02 LHAIFName eth1 LHAIFStatus down(2) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
