Hi Yan,

> However, it is F_STATUS message of the considerably first stage that hbagent 
> performs queueing .
> I pinpoint which hb_api of hbagent it is.

I confirmed it.

It is like the get_uuid processing that F_STATUS message is performed queueing 
of.

--- The next log added FUNCTION macro to a summons of read_api_msg. ---
--- get_uuid is reflected on the first log. ---
Jul 28 18:51:03 srv01 lha-snmpagent: [6538]: info: ##### yamuchi enqure_msg (): 
get_uuid ##### 
Jul 28 18:51:03 srv01 lha-snmpagent: [6538]: info: MSG: Dumping message with 12 
fields
Jul 28 18:51:03 srv01 lha-snmpagent: [6538]: info: MSG[0] : [t=status]
Jul 28 18:51:03 srv01 lha-snmpagent: [6538]: info: MSG[1] : [st=active]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[2] : [dt=6590]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[3] : [protocol=1]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[4] : [src=srv02]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[5] : 
[(1)srcuuid=0x889db30(36 27)]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[6] : [seq=6]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[7] : [hg=4ddb3649]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[8] : [ts=4e313107]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[9] : [ld=0.16 0.04 0.01 
2/89 6264]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[10] : [ttl=3]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[11] : [auth=1 
60410427f13e2377858cc0e403a8014c4704ab36]

In hb_agent, I think that cueing is considered to be it at the time of either 
next summons.

(snip)
int
init_heartbeat(void)
{
(snip)
        /*
         * get uuid for trap message.
         *   see: hbagentv2_update_diff() in hbagentv2.c
         */
        if (hb->llc_ops->get_uuid_by_name(hb, myid, &uuid) == HA_FAIL) {
                cl_log(LOG_ERR, "Cannot get mynodeid");
                cl_log(LOG_ERR, "REASON: %s", hb->llc_ops->errmsg(hb));
                return HA_FAIL;
        }

(snip)
int
walk_nodetable(void)
{
(snip)
#ifdef HAVE_NEW_HB_API
                /* the get_uuid_by_name is not available for STABLE_1_2 branch. 
*/
                if (hb->llc_ops->get_uuid_by_name(hb, name, &uuid) == HA_FAIL) {
                        cl_log(LOG_DEBUG, "Cannot get the uuid for node: %s", 
name);
                }
#endif /* HAVE_NEW_HB_API */
(snip)


Best Regards,
Hideo Yamauchi.




--- On Thu, 2011/7/28, [email protected] <[email protected]> 
wrote:

> Hi Yan,
> 
> Thank you for comment.
> 
> > > Hi Lars,
> > > Hi All,
> > > 
> > > A cause to be delayed became clear.
> > > 
> > > This problem occurs by a timing.
> > > 
> > > When hbagent receives F_STATUS message while hbagent waits for a reply of 
> > > the api communication, 
> > Under this circumstance, is there a specific heartbeat op that hbagent
> > is waiting for?
> 
> Yes.
> 
> However, it is F_STATUS message of the considerably first stage that hbagent 
> performs queueing .
> I pinpoint which hb_api of hbagent it is.
> 
> When I made the following modifications, it was over log of the queueing .
> 
> (snip)
> /*
>  * Read an API message.  All other messages are enqueued to be read later.
>  */
> static struct ha_msg *
> read_api_msg(llc_private_t* pi)
> {
>         for (;;) {
>                 struct ha_msg*  msg;
>                 const char *    type;
>                 pi->chan->ops->waitin(pi->chan);
>                 if (pi->chan->ch_status  == IPC_DISCONNECT){
>                         break;
>                 }
>                 if ((msg=msgfromIPC(pi->chan, 0)) == NULL) {
>                         ha_api_perror("read_api_msg: "                        
>               "Cannot read reply from IPC channel");
>                         continue;
>                 }
>                 if ((type=ha_msg_value(msg, F_TYPE)) != NULL                
> &&      strcmp(type, T_APIRESP) == 0) {
>                         return(msg);
>                 }
>                 /* Got an unexpected non-api message */
>                 /* Queue it up for reading later */
> /* yamauchi */
> if (strcasecmp(ha_msg_value(msg, F_TYPE),T_STATUS) == 0) {
>         cl_log(LOG_INFO, "##### yamuchi enqure_msg ()#####");
>         cl_log_message(LOG_INFO, msg);
> }
>                 enqueue_msg(pi, msg);
>         }
>         /*NOTREACHED*/
>         return(NULL);
> }
> 
> (snip)
> Jul 27 19:13:50 srv01 ccm: [5432]: info: ##### yamuchi enqure_msg ()#####
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG: Dumping message with 12 fields
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[0] : [t=status]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[1] : [st=active]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[2] : [dt=6590]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[3] : [protocol=1]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[4] : [src=srv02]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[5] : [(1)srcuuid=0xa006540(36 
> 27)]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[6] : [seq=6]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: ##### yamuchi enqure_msg 
> ()#####
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: ##### yamuchi enqure_msg ()#####
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[7] : [hg=4ddb3648]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG: Dumping message with 
> 12 fields
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG: Dumping message with 12 
> fields
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[8] : [ts=4e2fe4dd]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[0] : [t=status]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[0] : [t=status]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[9] : [ld=0.04 0.12 0.15 3/89 
> 5394]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[1] : [st=active]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[1] : [st=active]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[10] : [ttl=3]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[2] : [dt=6590]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[2] : [dt=6590]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[11] : [auth=1 
> 69619762aa14655cdccd9778ec4c4861a15a0f19]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[3] : [protocol=1]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[3] : [protocol=1]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[4] : [src=srv02]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[4] : [src=srv02]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[5] : 
> [(1)srcuuid=0x84255e0(36 27)]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[5] : 
> [(1)srcuuid=0x83b7bf8(36 27)]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[6] : [seq=6]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[6] : [seq=6]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[7] : [hg=4ddb3648]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[7] : [hg=4ddb3648]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[8] : [ts=4e2fe4dd]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[8] : [ts=4e2fe4dd]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[9] : [ld=0.04 0.12 
> 0.15 3/89 5394]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[9] : [ld=0.04 0.12 0.15 
> 3/89 5394]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[10] : [ttl=3]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[10] : [ttl=3]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[11] : [auth=1 
> 69619762aa14655cdccd9778ec4c4861a15a0f19]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[11] : [auth=1 
> 69619762aa14655cdccd9778ec4c4861a15a0f19]
> (snip)
> Jul 27 19:13:52 srv01 cib: [5433]: info: ##### yamuchi enqure_msg ()#####
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG: Dumping message with 12 fields
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[0] : [t=status]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[1] : [st=active]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[2] : [dt=6590]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[3] : [protocol=1]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[4] : [src=srv02]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[5] : [(1)srcuuid=0x8fc9060(36 
> 27)]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[6] : [seq=6]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[7] : [hg=4ddb3648]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[8] : [ts=4e2fe4dd]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[9] : [ld=0.04 0.12 0.15 3/89 
> 5394]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[10] : [ttl=3]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[11] : [auth=1 
> 69619762aa14655cdccd9778ec4c4861a15a0f19]
> (snip)
> 
> 
> > 
> > > F_STATUS is performed queueing of.
> > > 
> > > When hbagent caught the event from Heartbeat, this message is handled.
> > > Therefore, it is handled at the time of events such as one down of the 
> > > inter-connect.
> > > 
> > > Therefore, the active trap of the node is transmitted when inter-connect 
> > > fell.
> > > 
> > > /*
> > >  * Read an API message.  All other messages are enqueued to be read later.
> > >  */
> > > static struct ha_msg *
> > > read_api_msg(llc_private_t* pi)
> > > {
> > > 
> > >     for (;;) {
> > >         struct ha_msg*    msg;
> > >         const char *    type;
> > >         
> > >         pi->chan->ops->waitin(pi->chan);
> > >         if (pi->chan->ch_status  == IPC_DISCONNECT){
> > >             break;
> > >         }
> > >         if ((msg=msgfromIPC(pi->chan, 0)) == NULL) {
> > >             ha_api_perror("read_api_msg: "
> > >                       "Cannot read reply from IPC channel");
> > >             continue;
> > >         }
> > >         if ((type=ha_msg_value(msg, F_TYPE)) != NULL
> > >         &&    strcmp(type, T_APIRESP) == 0) {
> > >             return(msg);
> > >         }
> > >         /* Got an unexpected non-api message */
> > >         /* Queue it up for reading later */
> > >         enqueue_msg(pi, msg);
> > >     }
> > >     /*NOTREACHED*/
> > >     return(NULL);
> > > }
> > > 
> > > 
> > > 
> > > I think that the following correction is necessary.
> > > snmp_subagent/hbagent.c
> > > (snip)
> > >                         } else {
> > > 
> > >                                 /* snmp request */
> > >                                 snmp_read(&fdset);
> > > 
> > >                                 ret = handle_heartbeat_msg(); ----> read 
> > >queueing msg.!!
> > >                         }
> > > (snip)
> > I'm still confused about invoking handle_heartbreat_msg() when select()
> > finds that the SNMP socket has input. Is it an appropriate timing?
> 
> Sorry....
> 
> This correction is one example.
> Because I do not know a lot about handling of hbagent, I demand the 
> instructions of your right correction.
> 
> Best Regards,
> Hideo Yamauchi.
> 
> > 
> > Regards,
> >   Yan
> > -- 
> > Gao,Yan <[email protected]>
> > Software Engineer
> > China Server Team, SUSE.
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to