Hi,

On Thu, Apr 10, 2008 at 12:05:18PM +0200, Bernd Schubert wrote:
> On Wednesday 09 April 2008 22:20:16 Lars Marowsky-Bree wrote:
> > On 2008-04-09T20:26:02, Bernd Schubert <[EMAIL PROTECTED]> wrote:
> > > I still think there is another bug in heartbeat, though. There is simply
> > > no reason for heartbeat to wait $deadtime on initial startup of the
> > > heartbeat services, when it knows all heartbeat nodes are are up.
> > > If I at least could manually force it to online the nodes, I would have
> > > no problem with an initial-deadtime == deadtime.
> >
> > That _should_ work, indeed. If both sides are up, it should proceed
> > immediately. Do you have autojoin enabled? Which version?
> 
> This is 2.1.2, but after quickly grepping through the sources, I think this 
> problem is also in tip. There is simply presently no way to mark a node 
> online until the initial deadtime is over:
> 
> 
> polled_input_dispatch:
>               check_for_timeouts();
> 
>               check_comm_isup();
> 
> 
> 
> 
> 
> /* See if any nodes or links have timed out */
> static void
> check_for_timeouts(void)
> [...]
>               if (heartbeat_comm_state != COMM_LINKSUP) {
>                       /*
>                        * Compute alternative dead_ticks value for very first
>                        * dead interval.
>                        *
>                        * We do this because for some unknown reason
>                        * sometimes the network is slow to start working.
>                        * Experience indicates that 30 seconds is generally
>                        * enough.  It would be nice to have a better way to
>                        * detect that the network isn't really working, but
>                        * I don't know any easy way.
>                        * Patches are being accepted ;-)
>                        */
>                       dead_ticks
>                       =       msto_longclock(config->initial_deadtime_ms);
> 
> [...]
>               mark_node_dead(hip);
> 
> Then in 
> 
> static void
> check_comm_isup(void)
> {
>       struct node_info *      hip;
>       int     j;
>       int     heardfromcount = 0;
> 
> 
>       if (heartbeat_comm_state == COMM_LINKSUP) {
>               return;
>       }
>       
>       if (config->rtjoinconfig != HB_JOIN_NONE 
>           && !init_deadtime_passed){
>               return;
>       }
> 

Thanks for the analysis. I'll have to check the history of the
code.

Cheers,

Dejan

> 
> Thanks,
> Bernd
> 
> -- 
> Bernd Schubert
> Q-Leap Networks GmbH
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to