David Lang, I found more interesting info in /var/log/ha-debug files. I am attaching them as text. It is exciting, as it may offer us a straightforward way to diagnose this problem.
On pfs-srv3 (main) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Enabling logging daemon Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: logfile and debug file are those specified in logd config file (default /etc/logd.cf) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(initdead,180) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(bcast,eth1) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(node,pfs-srv3) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(node,pfs-srv4) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(auto_failback,on) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(crm,off) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Version 2 support: off Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: uid=hacluster, gid=<null> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: uid=hacluster, gid=<null> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: uid=<null>, gid=haclient Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: uid=root, gid=<null> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: uid=<null>, gid=haclient Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: Beginning authentication parsing Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: 16 max authentication methods Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: Keyfile opened Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: Keyfile perms OK Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: 16 max authentication methods Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: Found authentication method [md5] Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: AUTH: i=1: key = 0x88e6b30, auth=0xb7200034, authname=md5 Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: Outbound signing method is 1 Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: Authentication parsing complete [1] Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(cluster,linux-ha) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(hopfudge,1) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(baud,19200) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(hbgenmethod,file) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(realtime,true) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(msgfmt,classic) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(conn_logd_time,60) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(log_badpack,true) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(syslogmsgfmt,true) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(coredumps,true) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: WARN: Core dumps could be lost if multiple dumps occur. Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(autojoin,none) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(uuidfrom,file) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(compression,zlib) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(compression_threshold,2) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(traditional_compression,no) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(max_rexmit_delay,250) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: Setting max_rexmit_delay to 250 ms Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(record_config_changes,on) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(record_pengine_inputs,on) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(enable_config_writes,on) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: add_option(memreserve,6500) Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: ************************** Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Configuration validated. Starting heartbeat 3.0.2 Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: debug: HA configuration OK. Heartbeat starting. Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Heartbeat Hg Version: node: ed844d11ea2b603f7d01cce1700d6c1fcb404d29 Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: heartbeat: version 3.0.2 Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Heartbeat generation: 1279723767 Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: uuid is:392887ff-5f23-415f-8158-38fc5a57496c Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: FIFO process pid: 1219 Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: opening bcast eth1 (UDP/IP broadcast) Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: glib: SO_BINDTODEVICE(r) set for device eth1 Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: glib: UDP Broadcast heartbeat started on port 12694 (12694) interface eth1 Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: write process pid: 1220 Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: read child process pid: 1221 Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: glib: UDP Broadcast heartbeat closed on port 12694 interface eth1 - Status: 1 Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: make_io_childpair: CREATED childpair wchan socket 9 Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: make_io_childpair: CREATED childpair rchan socket 11 Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: G_main_add_TriggerHandler: Added signal manual handler Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: G_main_add_TriggerHandler: Added signal manual handler Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: Limiting CPU: 42 CPU seconds every 60000 milliseconds Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: pid 1180 locked in memory. Aug 10 18:04:43 pfs-srv3 heartbeat: [1219]: debug: pid 1219 locked in memory. Aug 10 18:04:43 pfs-srv3 heartbeat: [1221]: debug: pid 1221 locked in memory. Aug 10 18:04:43 pfs-srv3 heartbeat: [1220]: debug: pid 1220 locked in memory. Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: Waiting for child processes to start Aug 10 18:04:43 pfs-srv3 heartbeat: [1221]: debug: Limiting CPU: 6 CPU seconds every 59999 milliseconds Aug 10 18:04:43 pfs-srv3 heartbeat: [1220]: debug: Limiting CPU: 24 CPU seconds every 59999 milliseconds Aug 10 18:04:43 pfs-srv3 heartbeat: [1219]: debug: Limiting CPU: 6 CPU seconds every 59999 milliseconds Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Local status now set to: 'up' Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: All your child process are belong to us Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: Starting local status message @ 1000 ms intervals Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Link pfs-srv3:eth1 up. Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: CreateInitialFilter: ask_resources Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: CreateInitialFilter: hb_takeover Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: CreateInitialFilter: status Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: CreateInitialFilter: ip-request-resp Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: CreateInitialFilter: ip-request Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: debug: Forking temp process write_hostcachedata Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Managed write_hostcachedata process 1222 exited with return code 0. Aug 10 18:04:44 pfs-srv3 heartbeat: [1180]: info: Link pfs-srv4:eth1 up. Aug 10 18:04:44 pfs-srv3 heartbeat: [1180]: debug: Get a reqnodes message from pfs-srv4 Aug 10 18:04:44 pfs-srv3 heartbeat: [1180]: debug: get_delnodelist: delnodelist= Aug 10 18:04:44 pfs-srv3 heartbeat: [1180]: debug: sending reqnodes msg to node pfs-srv4 Aug 10 18:04:44 pfs-srv3 heartbeat: [1180]: debug: Forking temp process write_hostcachedata Aug 10 18:04:44 pfs-srv3 heartbeat: [1180]: info: Managed write_hostcachedata process 1223 exited with return code 0. Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Status update for node pfs-srv4: status up Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: Status seqno: 5 msgtime: 1281481485 Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: StartNextRemoteRscReq() - calling hook Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: notify_world: invoking harc: OLD status: up Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: Process [status] started pid 1263 Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: Starting notify process [status] Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Status update for node pfs-srv4: status active Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: Status seqno: 6 msgtime: 1281481485 Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: StartNextRemoteRscReq(): child count 1 Aug 10 18:04:45 pfs-srv3 heartbeat: [1263]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Aug 10 18:04:45 pfs-srv3 heartbeat: [1263]: debug: notify_world: Running harc status Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: Get a repnodes msg from pfs-srv4 Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: nodelist received:pfs-srv3 pfs-srv4 Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Comm_now_up(): updating status to active Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Local status now set to: 'active' Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: Sending local starting msg: resourcestate = 0 Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 0 Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: Forking temp process write_hostcachedata Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: Forking temp process write_delcachedata Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed write_hostcachedata process 1264 exited with return code 0. Aug 10 18:04:45 pfs-srv3 harc[1263]: [1271]: info: Running /etc/ha.d//rc.d/status status Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed status process 1263 exited with return code 0. Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: RscMgmtProc 'status' exited code 0 Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: StartNextRemoteRscReq() - calling hook Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: notify_world: invoking harc: OLD status: active Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: Process [status] started pid 1276 Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: Starting notify process [status] Aug 10 18:04:45 pfs-srv3 heartbeat: [1276]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Aug 10 18:04:45 pfs-srv3 heartbeat: [1276]: debug: notify_world: Running harc status Aug 10 18:04:45 pfs-srv3 harc[1276]: [1282]: info: Running /etc/ha.d//rc.d/status status Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed status process 1276 exited with return code 0. Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: RscMgmtProc 'status' exited code 0 Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed write_delcachedata process 1266 exited with return code 0. Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: process_resources(2): other now unstable Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 0 Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: STATE 1 => 3 Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 3 Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: debug: Process [req_our_resources(ask)] started pid 1441 Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: local resource transition completed. Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: debug: Sending hold resources msg: local, stable=1 # <none> Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (0)) Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: Initial resource acquisition complete (T_RESOURCES(us)) Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: debug: req_our_resources(/usr/share/heartbeat/ResourceManager listkeys pfs-srv3) Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: debug: req_our_resources(): running [/usr/share/heartbeat/req_resource drbddisk::r0] Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: info: 1 local resources from [/usr/share/heartbeat/ResourceManager listkeys pfs-srv3] Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: info: Local Resource acquisition completed. Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: debug: Sending hold resources msg: local, stable=1 # req_our_resources() Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: info: FIFO message [type resource] written rc=81 Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1)) Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: Managed req_our_resources(ask) process 1441 exited with return code 0. Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: debug: RscMgmtProc 'req_our_resources(ask)' exited code 0 Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: debug: process_resources(2): other now unstable Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 0 Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: remote resource transition completed. Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: debug: Sending hold resources msg: local, stable=1 # <none> Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1)) Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: debug: Calling PerformAutoFailback() Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 1 Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 1 Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 On pfs-srv4 (secondary) Aug 10 18:04:43 pfs-srv4 logd: [899]: info: logd started with /etc/logd.cf. Aug 10 18:04:43 pfs-srv4 logd: [899]: WARN: Core dumps could be lost if multiple dumps occur. Aug 10 18:04:43 pfs-srv4 logd: [899]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability Aug 10 18:04:43 pfs-srv4 logd: [899]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability Aug 10 18:04:43 pfs-srv4 logd: [899]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Aug 10 18:04:43 pfs-srv4 logd: [909]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: Enabling logging daemon Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: logfile and debug file are those specified in logd config file (default /etc/logd.cf) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(initdead,180) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(bcast,eth1) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(node,pfs-srv3) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(node,pfs-srv4) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(auto_failback,on) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(crm,off) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: Version 2 support: off Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: uid=hacluster, gid=<null> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: uid=hacluster, gid=<null> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: uid=<null>, gid=haclient Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: uid=root, gid=<null> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: uid=<null>, gid=haclient Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: Beginning authentication parsing Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: 16 max authentication methods Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: Keyfile opened Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: Keyfile perms OK Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: 16 max authentication methods Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: Found authentication method [md5] Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: AUTH: i=1: key = 0x9960ac8, auth=0xb7147034, authname=md5 Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: Outbound signing method is 1 Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: Authentication parsing complete [1] Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(cluster,linux-ha) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(hopfudge,1) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(baud,19200) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(hbgenmethod,file) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(realtime,true) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(msgfmt,classic) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(conn_logd_time,60) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(log_badpack,true) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(syslogmsgfmt,true) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(coredumps,true) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: WARN: Core dumps could be lost if multiple dumps occur. Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(autojoin,none) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(uuidfrom,file) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(compression,zlib) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(compression_threshold,2) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(traditional_compression,no) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: add_option(max_rexmit_delay,250) Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: debug: Setting max_rexmit_delay to 250 ms Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: debug: add_option(record_config_changes,on) Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: debug: add_option(record_pengine_inputs,on) Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: debug: add_option(enable_config_writes,on) Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: debug: add_option(memreserve,6500) Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: info: ************************** Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: info: Configuration validated. Starting heartbeat 3.0.2 Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: debug: HA configuration OK. Heartbeat starting. Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: info: Heartbeat Hg Version: node: ed844d11ea2b603f7d01cce1700d6c1fcb404d29 Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: heartbeat: version 3.0.2 Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Heartbeat generation: 1279723774 Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: uuid is:2e1a082a-dd05-4f16-8b43-7b6100e92f53 Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: FIFO process pid: 1188 Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: opening bcast eth1 (UDP/IP broadcast) Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: glib: SO_BINDTODEVICE(r) set for device eth1 Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: glib: UDP Broadcast heartbeat started on port 12694 (12694) interface eth1 Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: write process pid: 1189 Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: read child process pid: 1190 Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: glib: UDP Broadcast heartbeat closed on port 12694 interface eth1 - Status: 1 Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: make_io_childpair: CREATED childpair wchan socket 9 Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: make_io_childpair: CREATED childpair rchan socket 11 Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: G_main_add_TriggerHandler: Added signal manual handler Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: G_main_add_TriggerHandler: Added signal manual handler Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: Limiting CPU: 42 CPU seconds every 60000 milliseconds Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: pid 1162 locked in memory. Aug 10 18:04:44 pfs-srv4 heartbeat: [1188]: debug: pid 1188 locked in memory. Aug 10 18:04:44 pfs-srv4 heartbeat: [1189]: debug: pid 1189 locked in memory. Aug 10 18:04:44 pfs-srv4 heartbeat: [1190]: debug: pid 1190 locked in memory. Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: Waiting for child processes to start Aug 10 18:04:44 pfs-srv4 heartbeat: [1189]: debug: Limiting CPU: 24 CPU seconds every 59999 milliseconds Aug 10 18:04:44 pfs-srv4 heartbeat: [1190]: debug: Limiting CPU: 6 CPU seconds every 59999 milliseconds Aug 10 18:04:44 pfs-srv4 heartbeat: [1188]: debug: Limiting CPU: 6 CPU seconds every 59999 milliseconds Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Local status now set to: 'up' Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: All your child process are belong to us Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: Starting local status message @ 1000 ms intervals Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Link pfs-srv4:eth1 up. Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: CreateInitialFilter: ip-request-resp Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: CreateInitialFilter: ask_resources Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: CreateInitialFilter: hb_takeover Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: CreateInitialFilter: status Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: CreateInitialFilter: ip-request Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: debug: Forking temp process write_hostcachedata Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Managed write_hostcachedata process 1191 exited with return code 0. Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Link pfs-srv3:eth1 up. Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: sending reqnodes msg to node pfs-srv3 Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Status update for node pfs-srv3: status up Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: Status seqno: 2 msgtime: 1281481483 Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: StartNextRemoteRscReq() - calling hook Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: notify_world: invoking harc: OLD status: up Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: Process [status] started pid 1192 Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: Starting notify process [status] Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: Forking temp process write_hostcachedata Aug 10 18:04:45 pfs-srv4 heartbeat: [1192]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Aug 10 18:04:45 pfs-srv4 heartbeat: [1192]: debug: notify_world: Running harc status Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed write_hostcachedata process 1193 exited with return code 0. Aug 10 18:04:45 pfs-srv4 harc[1192]: [1199]: info: Running /etc/ha.d//rc.d/status status Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed status process 1192 exited with return code 0. Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: RscMgmtProc 'status' exited code 0 Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: Get a repnodes msg from pfs-srv3 Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: nodelist received:pfs-srv3 pfs-srv4 Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Comm_now_up(): updating status to active Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Local status now set to: 'active' Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: Sending local starting msg: resourcestate = 0 Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 0 Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: Get a reqnodes message from pfs-srv3 Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: get_delnodelist: delnodelist= Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: Forking temp process write_hostcachedata Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: debug: Forking temp process write_delcachedata Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed write_hostcachedata process 1204 exited with return code 0. Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed write_delcachedata process 1205 exited with return code 0. Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: Status update for node pfs-srv3: status active Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: debug: Status seqno: 7 msgtime: 1281481485 Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: debug: StartNextRemoteRscReq() - calling hook Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: debug: notify_world: invoking harc: OLD status: active Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: debug: Process [status] started pid 1213 Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: debug: Starting notify process [status] Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: AnnounceTakeover(local 0, foreign 1, reason 'HB_R_BOTHSTARTING' (0)) Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: debug: process_resources: other now unstable Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: debug: Sending hold resources msg: none, stable=0 # <none> Aug 10 18:04:46 pfs-srv4 heartbeat: [1213]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: STATE 1 => 3 Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 3 Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: STATE 3 => 2 Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 2 Aug 10 18:04:46 pfs-srv4 heartbeat: [1213]: debug: notify_world: Running harc status Aug 10 18:04:46 pfs-srv4 harc[1213]: [1219]: info: Running /etc/ha.d//rc.d/status status Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: Managed status process 1213 exited with return code 0. Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: debug: RscMgmtProc 'status' exited code 0 Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: remote resource transition completed. Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: debug: Sending hold resources msg: none, stable=0 # <none> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: STATE 2 => 3 Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 3 Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: debug: Calling PerformAutoFailback() Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: other_holds_resources: 1 Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: remote resource transition completed. Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: debug: Process [req_our_resources(ask)] started pid 1298 Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: debug: Sending hold resources msg: local, stable=1 # <none> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (0)) Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: Initial resource acquisition complete (T_RESOURCES(us)) Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 3 Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: debug: Calling PerformAutoFailback() Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(them)' (1)) Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: STATE 3 => 4 Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 Aug 10 18:04:56 pfs-srv4 heartbeat: [1298]: debug: req_our_resources(/usr/share/heartbeat/ResourceManager listkeys pfs-srv4) Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: other_holds_resources: 1 Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 Aug 10 18:04:56 pfs-srv4 heartbeat: [1298]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys pfs-srv4] to acquire. Aug 10 18:04:56 pfs-srv4 heartbeat: [1298]: debug: Sending hold resources msg: local, stable=1 # req_our_resources() Aug 10 18:04:56 pfs-srv4 heartbeat: [1298]: info: FIFO message [type resource] written rc=81 Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1)) Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: Managed req_our_resources(ask) process 1298 exited with return code 0. Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: debug: RscMgmtProc 'req_our_resources(ask)' exited code 0 Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: other_holds_resources: 1 Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
