Guys, I have a bit of clarification. In an attempt to avoid the timing
issues, an hour ago I tried adding a configuration change to
/etc/init.d/heartbeat to delay starting it by 2 minutes on one box. So
logs with takeover succeeding, and heartbeat shutting down are partly
an artifact of this change, as things never worked like that before.
You saw this and noticed that it was different from before.

I took that out and I am back to the exact situation I always was in
(no one takes over). Logs are at the bottom. What I do know from this
experiment, is that resource acquisition itself is unlikely to blame.

What I see now, s back to what I saw yesterday and prior, and makes no
sense to me.

pfs-srv3:


Aug 10 18:04:41 pfs-srv3 logd: [955]: WARN: Core dumps could be lost
if multiple dumps occur.
Aug 10 18:04:41 pfs-srv3 logd: [955]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Aug 10 18:04:41 pfs-srv3 logd: [955]: WARN: Consider setting
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Aug 10 18:04:41 pfs-srv3 logd: [955]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Aug 10 18:04:41 pfs-srv3 logd: [986]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Enabling logging daemon
Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: logfile and debug
file are those specified in logd config file (default /etc/logd.cf)
Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Version 2 support: off
Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: AUTH: i=1: key =
0x88e6b30, auth=0xb7200034, authname=md5
Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: WARN: Core dumps could be
lost if multiple dumps occur.
Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: WARN: Consider setting
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: **************************
Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Configuration
validated. Starting heartbeat 3.0.2
Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Heartbeat Hg
Version: node: ed844d11ea2b603f7d01cce1700d6c1fcb404d29
Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: heartbeat: version 3.0.2
Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Heartbeat
generation: 1279723767
Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: glib: UDP Broadcast
heartbeat started on port 12694 (12694) interface eth1
Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: glib: UDP Broadcast
heartbeat closed on port 12694 interface eth1 - Status: 1
Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info:
G_main_add_TriggerHandler: Added signal manual handler
Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info:
G_main_add_TriggerHandler: Added signal manual handler
Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Local status now set to: 'up'
Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Link pfs-srv3:eth1 up.
Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Managed
write_hostcachedata process 1222 exited with return code 0.
Aug 10 18:04:44 pfs-srv3 heartbeat: [1180]: info: Link pfs-srv4:eth1 up.
Aug 10 18:04:44 pfs-srv3 heartbeat: [1180]: info: Managed
write_hostcachedata process 1223 exited with return code 0.
Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Status update for
node pfs-srv4: status up
Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Status update for
node pfs-srv4: status active
Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Comm_now_up():
updating status to active
Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Local status now set
to: 'active'
Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed
write_hostcachedata process 1264 exited with return code 0.
Aug 10 18:04:45 pfs-srv3 harc[1263]: [1271]: info: Running
/etc/ha.d//rc.d/status status
Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed status
process 1263 exited with return code 0.
Aug 10 18:04:45 pfs-srv3 harc[1276]: [1282]: info: Running
/etc/ha.d//rc.d/status status
Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed status
process 1276 exited with return code 0.
Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed
write_delcachedata process 1266 exited with return code 0.
Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 0
Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: STATE 1 => 3
Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: local resource
transition completed.
Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info:
AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (0))
Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: Initial resource
acquisition complete (T_RESOURCES(us))
Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: info: 1 local resources
from [/usr/share/heartbeat/ResourceManager listkeys pfs-srv3]
Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: info: Local Resource
acquisition completed.
Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: info: FIFO message [type
resource] written rc=81
Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info:
AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: Managed
req_our_resources(ask) process 1441 exited with return code 0.
Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 0
Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: remote resource
transition completed.
Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info:
AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 1
Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 1


pfs-srv4:


Aug 10 18:04:43 pfs-srv4 logd: [899]: info: logd started with /etc/logd.cf.
Aug 10 18:04:43 pfs-srv4 logd: [899]: WARN: Core dumps could be lost
if multiple dumps occur.
Aug 10 18:04:43 pfs-srv4 logd: [899]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Aug 10 18:04:43 pfs-srv4 logd: [899]: WARN: Consider setting
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Aug 10 18:04:43 pfs-srv4 logd: [899]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Aug 10 18:04:43 pfs-srv4 logd: [909]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: Enabling logging daemon
Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: logfile and debug
file are those specified in logd config file (default /etc/logd.cf)
Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: Version 2 support: off
Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: AUTH: i=1: key =
0x9960ac8, auth=0xb7147034, authname=md5
Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: WARN: Core dumps could be
lost if multiple dumps occur.
Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: WARN: Consider setting
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: info: **************************
Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: info: Configuration
validated. Starting heartbeat 3.0.2
Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: info: Heartbeat Hg
Version: node: ed844d11ea2b603f7d01cce1700d6c1fcb404d29
Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: heartbeat: version 3.0.2
Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Heartbeat
generation: 1279723774
Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: glib: UDP Broadcast
heartbeat started on port 12694 (12694) interface eth1
Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: glib: UDP Broadcast
heartbeat closed on port 12694 interface eth1 - Status: 1
Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info:
G_main_add_TriggerHandler: Added signal manual handler
Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info:
G_main_add_TriggerHandler: Added signal manual handler
Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Local status now set to: 'up'
Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Link pfs-srv4:eth1 up.
Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Managed
write_hostcachedata process 1191 exited with return code 0.
Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Link pfs-srv3:eth1 up.
Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Status update for
node pfs-srv3: status up
Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed
write_hostcachedata process 1193 exited with return code 0.
Aug 10 18:04:45 pfs-srv4 harc[1192]: [1199]: info: Running
/etc/ha.d//rc.d/status status
Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed status
process 1192 exited with return code 0.
Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Comm_now_up():
updating status to active
Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Local status now set
to: 'active'
Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed
write_hostcachedata process 1204 exited with return code 0.
Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed
write_delcachedata process 1205 exited with return code 0.
Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: Status update for
node pfs-srv3: status active
Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info:
AnnounceTakeover(local 0, foreign 1, reason 'HB_R_BOTHSTARTING' (0))
Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: STATE 1 => 3
Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: STATE 3 => 2
Aug 10 18:04:46 pfs-srv4 harc[1213]: [1219]: info: Running
/etc/ha.d//rc.d/status status
Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: Managed status
process 1213 exited with return code 0.
Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: remote resource
transition completed.
Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: STATE 2 => 3
Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: other_holds_resources: 1
Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: remote resource
transition completed.
Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info:
AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (0))
Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: Initial resource
acquisition complete (T_RESOURCES(us))
Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info:
AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(them)' (1))
Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: STATE 3 => 4
Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: other_holds_resources: 1
Aug 10 18:04:56 pfs-srv4 heartbeat: [1298]: info: No local resources
[/usr/share/heartbeat/ResourceManager listkeys pfs-srv4] to acquire.
Aug 10 18:04:56 pfs-srv4 heartbeat: [1298]: info: FIFO message [type
resource] written rc=81
Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info:
AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: Managed
req_our_resources(ask) process 1298 exited with return code 0.
Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: other_holds_resources: 1
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to