On Tue, Aug 10, 2010 at 6:41 PM, David Lang
<[email protected]> wrote:
> On Tue, 10 Aug 2010, Igor Chudov wrote:
>
>> Guys, I have a bit of clarification. In an attempt to avoid the timing
>> issues, an hour ago I tried adding a configuration change to
>> /etc/init.d/heartbeat to delay starting it by 2 minutes on one box. So
>> logs with takeover succeeding, and heartbeat shutting down are partly
>> an artifact of this change, as things never worked like that before.
>> You saw this and noticed that it was different from before.
>>
>> I took that out and I am back to the exact situation I always was in
>> (no one takes over). Logs are at the bottom. What I do know from this
>> experiment, is that resource acquisition itself is unlikely to blame.
>>
>> What I see now, s back to what I saw yesterday and prior, and makes no
>> sense to me.
>
> nothing else shows up in the logs? I would expect the boxes to sit like this 
> for
> 40 seconds or so (2x deadtime setting IIRC, but it could be 30 sec + deadtime)
> and then there would be additional log entries.
>

I just checked, the machines were up since I sent the previous email
(42 minutes), nothing new was added to log files.

> As I noted in a prior e-mail, to work around issues where Cisco switches won't
> pass any traffic for 30 seconds after the port becomes live (I think the do
> spanning tree detection) heartbeat sits extra long when it first boots and
> doesn't hear anything, just in case the switch is preventing it from seeing
> another system that's up.
>

There is a crossover cable directly between their eth1 interfaces.

Broadcasting happens on eth1 too (per configs that I posted, I hope
that I am not wrong).

i

> David Lang
>
>> pfs-srv3:
>>
>>
>> Aug 10 18:04:41 pfs-srv3 logd: [955]: WARN: Core dumps could be lost
>> if multiple dumps occur.
>> Aug 10 18:04:41 pfs-srv3 logd: [955]: WARN: Consider setting
>> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
>> maximum supportability
>> Aug 10 18:04:41 pfs-srv3 logd: [955]: WARN: Consider setting
>> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
>> supportability
>> Aug 10 18:04:41 pfs-srv3 logd: [955]: info: G_main_add_SignalHandler:
>> Added signal handler for signal 15
>> Aug 10 18:04:41 pfs-srv3 logd: [986]: info: G_main_add_SignalHandler:
>> Added signal handler for signal 15
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Enabling logging daemon
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: logfile and debug
>> file are those specified in logd config file (default /etc/logd.cf)
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Version 2 support: off
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: AUTH: i=1: key =
>> 0x88e6b30, auth=0xb7200034, authname=md5
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: WARN: Core dumps could be
>> lost if multiple dumps occur.
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: WARN: Consider setting
>> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
>> maximum supportability
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: WARN: Consider setting
>> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
>> supportability
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: **************************
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Configuration
>> validated. Starting heartbeat 3.0.2
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1179]: info: Heartbeat Hg
>> Version: node: ed844d11ea2b603f7d01cce1700d6c1fcb404d29
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: heartbeat: version 3.0.2
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Heartbeat
>> generation: 1279723767
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: glib: UDP Broadcast
>> heartbeat started on port 12694 (12694) interface eth1
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: glib: UDP Broadcast
>> heartbeat closed on port 12694 interface eth1 - Status: 1
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info:
>> G_main_add_TriggerHandler: Added signal manual handler
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info:
>> G_main_add_TriggerHandler: Added signal manual handler
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info:
>> G_main_add_SignalHandler: Added signal handler for signal 17
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Local status now set to: 
>> 'up'
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Link pfs-srv3:eth1 up.
>> Aug 10 18:04:43 pfs-srv3 heartbeat: [1180]: info: Managed
>> write_hostcachedata process 1222 exited with return code 0.
>> Aug 10 18:04:44 pfs-srv3 heartbeat: [1180]: info: Link pfs-srv4:eth1 up.
>> Aug 10 18:04:44 pfs-srv3 heartbeat: [1180]: info: Managed
>> write_hostcachedata process 1223 exited with return code 0.
>> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Status update for
>> node pfs-srv4: status up
>> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Status update for
>> node pfs-srv4: status active
>> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Comm_now_up():
>> updating status to active
>> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Local status now set
>> to: 'active'
>> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed
>> write_hostcachedata process 1264 exited with return code 0.
>> Aug 10 18:04:45 pfs-srv3 harc[1263]: [1271]: info: Running
>> /etc/ha.d//rc.d/status status
>> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed status
>> process 1263 exited with return code 0.
>> Aug 10 18:04:45 pfs-srv3 harc[1276]: [1282]: info: Running
>> /etc/ha.d//rc.d/status status
>> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed status
>> process 1276 exited with return code 0.
>> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: Managed
>> write_delcachedata process 1266 exited with return code 0.
>> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 0
>> Aug 10 18:04:45 pfs-srv3 heartbeat: [1180]: info: STATE 1 => 3
>> Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: local resource
>> transition completed.
>> Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info:
>> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (0))
>> Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: Initial resource
>> acquisition complete (T_RESOURCES(us))
>> Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: info: 1 local resources
>> from [/usr/share/heartbeat/ResourceManager listkeys pfs-srv3]
>> Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: info: Local Resource
>> acquisition completed.
>> Aug 10 18:04:55 pfs-srv3 heartbeat: [1441]: info: FIFO message [type
>> resource] written rc=81
>> Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info:
>> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
>> Aug 10 18:04:55 pfs-srv3 heartbeat: [1180]: info: Managed
>> req_our_resources(ask) process 1441 exited with return code 0.
>> Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 0
>> Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: remote resource
>> transition completed.
>> Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info:
>> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
>> Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 1
>> Aug 10 18:04:56 pfs-srv3 heartbeat: [1180]: info: other_holds_resources: 1
>>
>>
>> pfs-srv4:
>>
>>
>> Aug 10 18:04:43 pfs-srv4 logd: [899]: info: logd started with /etc/logd.cf.
>> Aug 10 18:04:43 pfs-srv4 logd: [899]: WARN: Core dumps could be lost
>> if multiple dumps occur.
>> Aug 10 18:04:43 pfs-srv4 logd: [899]: WARN: Consider setting
>> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
>> maximum supportability
>> Aug 10 18:04:43 pfs-srv4 logd: [899]: WARN: Consider setting
>> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
>> supportability
>> Aug 10 18:04:43 pfs-srv4 logd: [899]: info: G_main_add_SignalHandler:
>> Added signal handler for signal 15
>> Aug 10 18:04:43 pfs-srv4 logd: [909]: info: G_main_add_SignalHandler:
>> Added signal handler for signal 15
>> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: Enabling logging daemon
>> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: logfile and debug
>> file are those specified in logd config file (default /etc/logd.cf)
>> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: Version 2 support: off
>> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: info: AUTH: i=1: key =
>> 0x9960ac8, auth=0xb7147034, authname=md5
>> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: WARN: Core dumps could be
>> lost if multiple dumps occur.
>> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: WARN: Consider setting
>> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
>> maximum supportability
>> Aug 10 18:04:43 pfs-srv4 heartbeat: [1161]: WARN: Consider setting
>> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
>> supportability
>> Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: info: **************************
>> Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: info: Configuration
>> validated. Starting heartbeat 3.0.2
>> Aug 10 18:04:44 pfs-srv4 heartbeat: [1161]: info: Heartbeat Hg
>> Version: node: ed844d11ea2b603f7d01cce1700d6c1fcb404d29
>> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: heartbeat: version 3.0.2
>> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Heartbeat
>> generation: 1279723774
>> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: glib: UDP Broadcast
>> heartbeat started on port 12694 (12694) interface eth1
>> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: glib: UDP Broadcast
>> heartbeat closed on port 12694 interface eth1 - Status: 1
>> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info:
>> G_main_add_TriggerHandler: Added signal manual handler
>> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info:
>> G_main_add_TriggerHandler: Added signal manual handler
>> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info:
>> G_main_add_SignalHandler: Added signal handler for signal 17
>> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Local status now set to: 
>> 'up'
>> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Link pfs-srv4:eth1 up.
>> Aug 10 18:04:44 pfs-srv4 heartbeat: [1162]: info: Managed
>> write_hostcachedata process 1191 exited with return code 0.
>> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Link pfs-srv3:eth1 up.
>> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Status update for
>> node pfs-srv3: status up
>> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed
>> write_hostcachedata process 1193 exited with return code 0.
>> Aug 10 18:04:45 pfs-srv4 harc[1192]: [1199]: info: Running
>> /etc/ha.d//rc.d/status status
>> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed status
>> process 1192 exited with return code 0.
>> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Comm_now_up():
>> updating status to active
>> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Local status now set
>> to: 'active'
>> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed
>> write_hostcachedata process 1204 exited with return code 0.
>> Aug 10 18:04:45 pfs-srv4 heartbeat: [1162]: info: Managed
>> write_delcachedata process 1205 exited with return code 0.
>> Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: Status update for
>> node pfs-srv3: status active
>> Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info:
>> AnnounceTakeover(local 0, foreign 1, reason 'HB_R_BOTHSTARTING' (0))
>> Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: STATE 1 => 3
>> Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: STATE 3 => 2
>> Aug 10 18:04:46 pfs-srv4 harc[1213]: [1219]: info: Running
>> /etc/ha.d//rc.d/status status
>> Aug 10 18:04:46 pfs-srv4 heartbeat: [1162]: info: Managed status
>> process 1213 exited with return code 0.
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: remote resource
>> transition completed.
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: STATE 2 => 3
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: other_holds_resources: 1
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: remote resource
>> transition completed.
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info:
>> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (0))
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: Initial resource
>> acquisition complete (T_RESOURCES(us))
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info:
>> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(them)' (1))
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: STATE 3 => 4
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: other_holds_resources: 1
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1298]: info: No local resources
>> [/usr/share/heartbeat/ResourceManager listkeys pfs-srv4] to acquire.
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1298]: info: FIFO message [type
>> resource] written rc=81
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info:
>> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: Managed
>> req_our_resources(ask) process 1298 exited with return code 0.
>> Aug 10 18:04:56 pfs-srv4 heartbeat: [1162]: info: other_holds_resources: 1
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to