On Tue, 10 Aug 2010, Igor Chudov wrote:

> Guys, I have a bit of clarification. In an attempt to avoid the timing
> issues, an hour ago I tried adding a configuration change to
> /etc/init.d/heartbeat to delay starting it by 2 minutes on one box. So
> logs with takeover succeeding, and heartbeat shutting down are partly
> an artifact of this change, as things never worked like that before.
> You saw this and noticed that it was different from before.
>
> I took that out and I am back to the exact situation I always was in
> (no one takes over). Logs are at the bottom. What I do know from this
> experiment, is that resource acquisition itself is unlikely to blame.
>
> What I see now, s back to what I saw yesterday and prior, and makes no
> sense to me.

nothing else shows up in the logs? I would expect the boxes to sit like this 
for 
40 seconds or so (2x deadtime setting IIRC, but it could be 30 sec + deadtime) 
and then there would be additional log entries.

As I noted in a prior e-mail, to work around issues where Cisco switches won't 
pass any traffic for 30 seconds after the port becomes live (I think the do 
spanning tree detection) heartbeat sits extra long when it first boots and 
doesn't hear anything, just in case the switch is preventing it from seeing 
another system that's up.

David Lang

> pfs-srv3:
>
>
> Aug 10 18:04:41 pfs-srv3 logd: [955]: WARN: Core dumps could be lost
> if multiple dumps occur.
> Aug 10 18:04:41 pfs-srv3 logd: [955]: WARN: Consider setting
> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
> maximum supportability
> Aug 10 18:04:41 pfs-srv3 logd: [955]: WARN: Consider setting
> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
> supportability
> Aug 10 18:04:41 pfs-srv3 logd: [955]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Aug 10 18:04:41 pfs-srv3 logd: [986]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Enabling logging daemon
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: logfile and debug
> file are those specified in logd config file (default /etc/logd.cf)
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Version 2 support: off
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: AUTH: i=1: key =
> 0x88e6b30, auth=0xb7200034, authname=md5
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: WARN: Core dumps could be
> lost if multiple dumps occur.
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: WARN: Consider setting
> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
> maximum supportability
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: WARN: Consider setting
> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
> supportability
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: **************************
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Configuration
> validated. Starting heartbeat 3.0.2
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Heartbeat Hg
> Version: node: ed844d11ea2b603f7d01cce1700d6c1fcb404d29
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: heartbeat: version 3.0.2
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Heartbeat
> generation: 1279723767
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: glib: UDP Broadcast
> heartbeat started on port 12694 (12694) interface eth1
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: glib: UDP Broadcast
> heartbeat closed on port 12694 interface eth1 - Status: 1
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info:
> G_main_add_TriggerHandler: Added signal manual handler
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info:
> G_main_add_TriggerHandler: Added signal manual handler
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info:
> G_main_add_SignalHandler: Added signal handler for signal 17
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Local status now set to: 
> 'up'
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Link pfs-srv3:eth1 up.
> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Managed
> write_hostcachedata process 1222 exited with return code 0.
> Aug 10 18:04:44 pfs-srv3 heartbeat: [1180]: info: Link pfs-srv4:eth1 up.
> Aug 10 18:04:44 pfs-srv3 heartbeat: [1180]: info: Managed
> write_hostcachedata process 1223 exited with return code 0.
> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Status update for
> node pfs-srv4: status up
> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Status update for
> node pfs-srv4: status active
> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Comm_now_up():
> updating status to active
> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Local status now set
> to: 'active'
> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed
> write_hostcachedata process 1264 exited with return code 0.
> Aug 10 18:04:45 pfs-srv3 harc[1263]: [1271]: info: Running
> /etc/ha.d//rc.d/status status
> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed status
> process 1263 exited with return code 0.
> Aug 10 18:04:45 pfs-srv3 harc[1276]: [1282]: info: Running
> /etc/ha.d//rc.d/status status
> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed status
> process 1276 exited with return code 0.
> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed
> write_delcachedata process 1266 exited with return code 0.
> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 0
> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: STATE 1 => 3
> Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: local resource
> transition completed.
> Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info:
> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (0))
> Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: Initial resource
> acquisition complete (T_RESOURCES(us))
> Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: info: 1 local resources
> from [/usr/share/heartbeat/ResourceManager listkeys pfs-srv3]
> Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: info: Local Resource
> acquisition completed.
> Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: info: FIFO message [type
> resource] written rc=81
> Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info:
> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
> Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: Managed
> req_our_resources(ask) process 1441 exited with return code 0.
> Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 0
> Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: remote resource
> transition completed.
> Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info:
> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
> Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 1
> Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 1
>
>
> pfs-srv4:
>
>
> Aug 10 18:04:43 pfs-srv4 logd: [899]: info: logd started with /etc/logd.cf.
> Aug 10 18:04:43 pfs-srv4 logd: [899]: WARN: Core dumps could be lost
> if multiple dumps occur.
> Aug 10 18:04:43 pfs-srv4 logd: [899]: WARN: Consider setting
> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
> maximum supportability
> Aug 10 18:04:43 pfs-srv4 logd: [899]: WARN: Consider setting
> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
> supportability
> Aug 10 18:04:43 pfs-srv4 logd: [899]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Aug 10 18:04:43 pfs-srv4 logd: [909]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: Enabling logging daemon
> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: logfile and debug
> file are those specified in logd config file (default /etc/logd.cf)
> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: Version 2 support: off
> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: AUTH: i=1: key =
> 0x9960ac8, auth=0xb7147034, authname=md5
> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: WARN: Core dumps could be
> lost if multiple dumps occur.
> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: WARN: Consider setting
> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
> maximum supportability
> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: WARN: Consider setting
> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
> supportability
> Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: info: **************************
> Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: info: Configuration
> validated. Starting heartbeat 3.0.2
> Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: info: Heartbeat Hg
> Version: node: ed844d11ea2b603f7d01cce1700d6c1fcb404d29
> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: heartbeat: version 3.0.2
> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Heartbeat
> generation: 1279723774
> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: glib: UDP Broadcast
> heartbeat started on port 12694 (12694) interface eth1
> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: glib: UDP Broadcast
> heartbeat closed on port 12694 interface eth1 - Status: 1
> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info:
> G_main_add_TriggerHandler: Added signal manual handler
> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info:
> G_main_add_TriggerHandler: Added signal manual handler
> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info:
> G_main_add_SignalHandler: Added signal handler for signal 17
> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Local status now set to: 
> 'up'
> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Link pfs-srv4:eth1 up.
> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Managed
> write_hostcachedata process 1191 exited with return code 0.
> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Link pfs-srv3:eth1 up.
> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Status update for
> node pfs-srv3: status up
> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed
> write_hostcachedata process 1193 exited with return code 0.
> Aug 10 18:04:45 pfs-srv4 harc[1192]: [1199]: info: Running
> /etc/ha.d//rc.d/status status
> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed status
> process 1192 exited with return code 0.
> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Comm_now_up():
> updating status to active
> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Local status now set
> to: 'active'
> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed
> write_hostcachedata process 1204 exited with return code 0.
> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed
> write_delcachedata process 1205 exited with return code 0.
> Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: Status update for
> node pfs-srv3: status active
> Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info:
> AnnounceTakeover(local 0, foreign 1, reason 'HB_R_BOTHSTARTING' (0))
> Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: STATE 1 => 3
> Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: STATE 3 => 2
> Aug 10 18:04:46 pfs-srv4 harc[1213]: [1219]: info: Running
> /etc/ha.d//rc.d/status status
> Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: Managed status
> process 1213 exited with return code 0.
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: remote resource
> transition completed.
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: STATE 2 => 3
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: other_holds_resources: 1
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: remote resource
> transition completed.
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info:
> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (0))
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: Initial resource
> acquisition complete (T_RESOURCES(us))
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info:
> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(them)' (1))
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: STATE 3 => 4
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: other_holds_resources: 1
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1298]: info: No local resources
> [/usr/share/heartbeat/ResourceManager listkeys pfs-srv4] to acquire.
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1298]: info: FIFO message [type
> resource] written rc=81
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info:
> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: Managed
> req_our_resources(ask) process 1298 exited with return code 0.
> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: other_holds_resources: 1
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to