Re: [Linux-HA] Initial dead time is smaller than deadtime

Andrew Beekhof Wed, 09 Apr 2008 08:06:47 -0700


On Apr 8, 2008, at 8:18 PM, Bernd Schubert wrote:

On Tuesday 08 April 2008 19:32:58 Bernd Schubert wrote:
Hello,
I need to set a rather huge dead time of 1200s, but the initialdead time
is supposed to be of 120s or less. However, heartbeat tries to be
schoolmasterly and doesn't want to accept my settings:

deadtime 1200 # time to declare a node dead
initdead 120  # time to declare a node dead on heartbeat startup
keepalive 120 # how often to send keepalive packets
heartbeat[6523]: 2008/04/08_19:23:16 ERROR: Initial dead time[120000] is
smaller than deadtime [1200000]
eartbeat[6523]: 2008/04/08_19:23:16 ERROR: Configuration error,heartbeat
not started.
Well, heartbeat is not startup up automatically here and even thenodes arenot powered on automatically after a hard reset. So when I startheartbeatI'm activeley monitoring everything and there is absolutely no needto letme wait at least 20min on start up. I'm even not convinced adeadtime of
20min is sufficient, since this is for a Lustre cluster and Lustre
sometimes manages to create such a high load that nothing else thanthe
Lustre and related kernel threads do work on the system...
So pretty please, is there a setting allowing to override thisridiculous
initdead  time checking?
Doesn't look like the error can be overriden

       /* Check deadtime parameters */
       if (config->initial_deadtime_ms < config->deadtime_ms) {
               ha_log(LOG_ERR
               ,       "Initial dead time [%ld] is smaller than"
               " deadtime [%ld]"
, config->initial_deadtime_ms, config->deadtime_ms);
               ++errcount;
       }else if (config->initial_deadtime_ms < 10000) {

Have you tried compiling a version with the "++errcount;" partcommented out?Seems like a strange thing to be a fatal - unless the internalalgorithms make crappy assumptions.

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Initial dead time is smaller than deadtime

Reply via email to