Hello,
I have been asked to prove that our cluster can handle a times() wrap
around after 247 days so I downloaded a src rpm for the version we are
using (2.0.4) and also 2.0.5. Engineered a white box wrap by modifiying
longclock and observed some problems. The primary server when wrapping
would report "No local heartbeat. Forcing restart" but fail to restart
and instead stop heartbeat without stopping the virtualaddress, this
resulted in the secondary kicking in and also grabbing the
virtualaddress so that both servers held it. When performing the wrap
test on the secondary server it did something similar but succeeded in
restarting without affecting cluster operation. Our operating system is
Red Hat 3.4.3-9.EL4 and the 2.0.5 rpm was the latest I could get to
build without errors.
Could you tell me if there is a times() wrap problem or whether there is
a flaw in my white box test?
Thanks,
Michael
I added the following code to longclock, the setting of lasttimes to
MINJUMP was necessary to trigger the "normal wrap" code as opposed to
the "jump back in error" code. I've triggered the wrap at various times
after startup:
timesval = (unsigned long) times(&longclock_dummy_tms_struct);
if ( (callcount % wrapmod) == 0 )
{
wrapcount++;
cl_log(LOG_INFO, "MJ2 WRAPPING %lu!!!!!!!!!!!!!!!!!!",
wrapmod);
timesval = 0;
lasttimes = MINJUMP;
wrapmod = 8000;
}
if (callcount % 100 == 0)
{
if (wrapcount)
{
cl_log(LOG_INFO, "MJ WRAPPED %lu (%lu)",
timesval, wrapcount);
}
else
{
cl_log(LOG_INFO, "MJ2 %s %d (%ld)",
__FUNCTION__, sizeof(longclock_t),
timesval);
}
}
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems