So, I've been following the instructions in the FAQ for setting a
suitable deadtime and have noticed one thing that seems a little odd to
me - it seems that whenever a node reboots and rejoins the cluster,
heartbeat on the other node sees the new generation number and I then
often see a late heartbeat message on BOTH nodes where the time is
between 3 and 4s.

I am running 2.1.0 with keepalive=1s and warntime=3s and have attached
some log extracts from the relevant times.

Any ideas why restarting hb should result in a 3+s delay like this? Am I
being too picky to expect this not to happen?

Thanks,
Simon

On the node that does not reboot, I see output like this in the log file
when the other node restarts:

Jun 23 04:50:32 lisa heartbeat: [7887]: info: Heartbeat restart on node
bart.sn.stratus.com
Jun 23 04:50:32 lisa heartbeat: [7887]: info: Link
bart.sn.stratus.com:eth0 up.
Jun 23 04:50:32 lisa heartbeat: [7887]: info: Status update for node
bart.sn.stratus.com: status init
Jun 23 04:50:32 lisa heartbeat: [7887]: info: Link
bart.sn.stratus.com:eth1 up.
Jun 23 04:50:32 lisa heartbeat: [7887]: info: Status update for node
bart.sn.stratus.com: status up
Jun 23 04:50:32 lisa heartbeat: [7887]: debug: get_delnodelist:
delnodelist= 
Jun 23 04:50:32 lisa heartbeat: [7887]: info: all clients are now paused
Jun 23 04:50:32 lisa heartbeat: [7887]: debug: hist->ackseq =8523
Jun 23 04:50:32 lisa heartbeat: [7887]: debug: hist->lowseq =8522,
hist->hiseq=8624
Jun 23 04:50:32 lisa heartbeat: [7887]: debug:  
Jun 23 04:50:32 lisa crmd: [7962]: notice: crmd_ha_status_callback:
Status update: Node bart.sn.stratus.com now has status [init]
Jun 23 04:50:32 lisa crmd: [7962]: info: crmd_ha_status_callback: Ping
node bart.sn.stratus.com is init
Jun 23 04:50:32 lisa crmd: [7962]: notice: crmd_ha_status_callback:
Status update: Node bart.sn.stratus.com now has status [up]
Jun 23 04:50:32 lisa crmd: [7962]: info: crmd_ha_status_callback: Ping
node bart.sn.stratus.com is up
Jun 23 04:50:32 lisa heartbeat: [7887]: debug: hist->ackseq =8523
Jun 23 04:50:32 lisa heartbeat: [7887]: debug: hist->lowseq =8522,
hist->hiseq=8625
Jun 23 04:50:32 lisa heartbeat: [7887]: debug: expecting from
bart.sn.stratus.com
Jun 23 04:50:32 lisa heartbeat: [7887]: debug: it's ackseq=0
Jun 23 04:50:32 lisa heartbeat: [7887]: debug:  
Jun 23 04:50:32 lisa heartbeat: [7887]: info: all clients are now
resumed
Jun 23 04:50:35 lisa heartbeat: [7887]: WARN: Late heartbeat: Node
bart.sn.stratus.com: interval 3020 ms
Jun 23 04:50:35 lisa heartbeat: [7887]: info: Status update for node
bart.sn.stratus.com: status active
Jun 23 04:50:35 lisa crmd: [7962]: notice: crmd_ha_status_callback:
Status update: Node bart.sn.stratus.com now has status [active]
Jun 23 04:50:35 lisa cib: [7958]: info: cib_diff_notify: Local-only
Change (client:7962, call: 51): 0.2.68 (ok)
Jun 23 04:50:35 lisa tengine: [10395]: info: te_update_diff: Processing
diff (cib_update): 0.2.68 -> 0.2.68
Jun 23 04:50:35 lisa cib: [12691]: info: write_cib_contents: Wrote
version 0.2.68 of the CIB to disk (digest:
9c9b0cce1f2884c4e6b1f98163dd8c2e)
Jun 23 04:50:36 lisa cib: [7958]: info: cib_client_status_callback:
Status update: Client bart.sn.stratus.com/cib now has status [join]
Jun 23 04:50:37 lisa heartbeat: [7887]: WARN: 1 lost packet(s) for
[bart.sn.stratus.com] [14:16]
Jun 23 04:50:37 lisa heartbeat: [7887]: info: No pkts missing from
bart.sn.stratus.com!
Jun 23 04:50:37 lisa crmd: [7962]: notice: crmd_client_status_callback:
Status update: Client bart.sn.stratus.com/crmd now has status [online]
Jun 23 04:50:37 lisa crmd: [7962]: info: crmd_client_status_callback:
Uncaching UUID for bart.sn.stratus.com

Meanwhile, on the node that just rebooted, I see:

Jun 23 04:50:30 bart heartbeat: [7977]: info: Enabling logging daemon 
Jun 23 04:50:30 bart heartbeat: [7977]: info: logfile and debug file are
those specified in logd config file (default /etc/logd.cf)
Jun 23 04:50:30 bart heartbeat: [7977]: info: Version 2 support: true
Jun 23 04:50:31 bart heartbeat: [7977]: info: **************************
Jun 23 04:50:31 bart heartbeat: [7977]: info: Configuration validated.
Starting heartbeat 2.0.9
Jun 23 04:50:31 bart heartbeat: [7982]: info: heartbeat: version 2.0.9
Jun 23 04:50:31 bart heartbeat: [7982]: info: Heartbeat generation:
1182588631
Jun 23 04:50:31 bart heartbeat: [7982]: info: G_main_add_TriggerHandler:
Added signal manual handler
Jun 23 04:50:31 bart heartbeat: [7982]: info: G_main_add_TriggerHandler:
Added signal manual handler
Jun 23 04:50:31 bart heartbeat: [7982]: info: Removing
/var/run/heartbeat/rsctmp failed, recreating.
Jun 23 04:50:31 bart heartbeat: [7982]: info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth0
Jun 23 04:50:31 bart heartbeat: [7982]: info: glib: ucast: bound send
socket to device: eth0
Jun 23 04:50:31 bart heartbeat: [7982]: info: glib: ucast: bound receive
socket to device: eth0
Jun 23 04:50:31 bart heartbeat: [7982]: info: glib: ucast: started on
port 694 interface eth0 to 134.111.30.58
Jun 23 04:50:31 bart heartbeat: [7982]: info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth1
Jun 23 04:50:31 bart heartbeat: [7982]: info: glib: ucast: bound send
socket to device: eth1
Jun 23 04:50:32 bart heartbeat: [7982]: info: glib: ucast: bound receive
socket to device: eth1
Jun 23 04:50:32 bart heartbeat: [7982]: info: glib: ucast: started on
port 694 interface eth1 to 192.168.1.101
Jun 23 04:50:32 bart heartbeat: [7982]: info: G_main_add_SignalHandler:
Added signal handler for signal 17
Jun 23 04:50:33 bart heartbeat: [7982]: info: Local status now set to:
'up'
Jun 23 04:50:33 bart heartbeat: [7982]: info: Link
lisa.sn.stratus.com:eth0 up.
Jun 23 04:50:33 bart heartbeat: [7982]: info: Status update for node
lisa.sn.stratus.com: status active
Jun 23 04:50:34 bart heartbeat: [7982]: info: Link
lisa.sn.stratus.com:eth1 up.
Jun 23 04:50:35 bart heartbeat: [7982]: info: Comm_now_up(): updating
status to active
Jun 23 04:50:36 bart heartbeat: [7982]: WARN: Late heartbeat: Node
bart.sn.stratus.com: interval 3020 ms
Jun 23 04:50:36 bart heartbeat: [7982]: info: Local status now set to:
'active'
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to