On Wednesday 09 April 2008 22:20:16 Lars Marowsky-Bree wrote:
> On 2008-04-09T20:26:02, Bernd Schubert <[EMAIL PROTECTED]> wrote:
> > I still think there is another bug in heartbeat, though. There is simply
> > no reason for heartbeat to wait $deadtime on initial startup of the
> > heartbeat services, when it knows all heartbeat nodes are are up.
> > If I at least could manually force it to online the nodes, I would have
> > no problem with an initial-deadtime == deadtime.
>
> That _should_ work, indeed. If both sides are up, it should proceed
> immediately. Do you have autojoin enabled? Which version?
This is 2.1.2, but after quickly grepping through the sources, I think this
problem is also in tip. There is simply presently no way to mark a node
online until the initial deadtime is over:
polled_input_dispatch:
check_for_timeouts();
check_comm_isup();
/* See if any nodes or links have timed out */
static void
check_for_timeouts(void)
[...]
if (heartbeat_comm_state != COMM_LINKSUP) {
/*
* Compute alternative dead_ticks value for very first
* dead interval.
*
* We do this because for some unknown reason
* sometimes the network is slow to start working.
* Experience indicates that 30 seconds is generally
* enough. It would be nice to have a better way to
* detect that the network isn't really working, but
* I don't know any easy way.
* Patches are being accepted ;-)
*/
dead_ticks
= msto_longclock(config->initial_deadtime_ms);
[...]
mark_node_dead(hip);
Then in
static void
check_comm_isup(void)
{
struct node_info * hip;
int j;
int heardfromcount = 0;
if (heartbeat_comm_state == COMM_LINKSUP) {
return;
}
if (config->rtjoinconfig != HB_JOIN_NONE
&& !init_deadtime_passed){
return;
}
Thanks,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems