On Tue, Jul 26, 2011 at 01:43:00PM +0900, [email protected] wrote:
> Hi Lars,
> Hi All,
> 
> A cause to be delayed became clear.
> 
> This problem occurs by a timing.
> 
> When hbagent receives F_STATUS message while hbagent waits for a reply of the 
> api communication, F_STATUS is performed queueing of.
> 
> When hbagent caught the event from Heartbeat, this message is handled.
> Therefore, it is handled at the time of events such as one down of the 
> inter-connect.
> 
> Therefore, the active trap of the node is transmitted when inter-connect fell.
> 
> /*
>  * Read an API message.  All other messages are enqueued to be read later.
>  */
> static struct ha_msg *
> read_api_msg(llc_private_t* pi)
> {
> 
>       for (;;) {
>               struct ha_msg*  msg;
>               const char *    type;
>               
>               pi->chan->ops->waitin(pi->chan);
>               if (pi->chan->ch_status  == IPC_DISCONNECT){
>                       break;
>               }
>               if ((msg=msgfromIPC(pi->chan, 0)) == NULL) {
>                       ha_api_perror("read_api_msg: "
>                                     "Cannot read reply from IPC channel");
>                       continue;
>               }
>               if ((type=ha_msg_value(msg, F_TYPE)) != NULL
>               &&      strcmp(type, T_APIRESP) == 0) {
>                       return(msg);
>               }
>               /* Got an unexpected non-api message */
>               /* Queue it up for reading later */
>               enqueue_msg(pi, msg);
>       }
>       /*NOTREACHED*/
>       return(NULL);
> }
> 
> 
> 
> I think that the following correction is necessary.
> snmp_subagent/hbagent.c
> (snip)
>                         } else {
> 
>                                 /* snmp request */
>                                 snmp_read(&fdset);
> 
>                                 ret = handle_heartbeat_msg(); ----> read 
> queueing msg.!!

I suggest to place this before the select instead.
Or immediately after each call that involves the read_api_msg or
enqueue_msg.

Probably easier to just place it before the select, or any other call
that may sleep or block for some time.

As hbagent.c was dropped from the heartbeat source tree three years ago,
you will have to carry that patch yourself, I'm affraid.

Unless someone resurrects the hbagent for current heartbeat,
if still applicable, and possibly improves/integrates it
with the pacemaker side of things.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to