Re: [Linux-HA] Initial dead time is smaller than deadtime

Bernd Schubert Thu, 10 Apr 2008 03:05:50 -0700

On Wednesday 09 April 2008 22:20:16 Lars Marowsky-Bree wrote:
> On 2008-04-09T20:26:02, Bernd Schubert <[EMAIL PROTECTED]> wrote:
> > I still think there is another bug in heartbeat, though. There is simply
> > no reason for heartbeat to wait $deadtime on initial startup of the
> > heartbeat services, when it knows all heartbeat nodes are are up.
> > If I at least could manually force it to online the nodes, I would have
> > no problem with an initial-deadtime == deadtime.
>
> That _should_ work, indeed. If both sides are up, it should proceed
> immediately. Do you have autojoin enabled? Which version?


This is 2.1.2, but after quickly grepping through the sources, I think this 
problem is also in tip. There is simply presently no way to mark a node 
online until the initial deadtime is over:


polled_input_dispatch:
                check_for_timeouts();

                check_comm_isup();





/* See if any nodes or links have timed out */
static void
check_for_timeouts(void)
[...]
                if (heartbeat_comm_state != COMM_LINKSUP) {
                        /*
                         * Compute alternative dead_ticks value for very first
                         * dead interval.
                         *
                         * We do this because for some unknown reason
                         * sometimes the network is slow to start working.
                         * Experience indicates that 30 seconds is generally
                         * enough.  It would be nice to have a better way to
                         * detect that the network isn't really working, but
                         * I don't know any easy way.
                         * Patches are being accepted ;-)
                         */
                        dead_ticks
                        =       msto_longclock(config->initial_deadtime_ms);

[...]
                mark_node_dead(hip);

Then in 

static void
check_comm_isup(void)
{
        struct node_info *      hip;
        int     j;
        int     heardfromcount = 0;


        if (heartbeat_comm_state == COMM_LINKSUP) {
                return;
        }
        
        if (config->rtjoinconfig != HB_JOIN_NONE 
            && !init_deadtime_passed){
                return;
        }




Thanks,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Initial dead time is smaller than deadtime

Reply via email to