On Tue, Aug 10, 2010 at 2:28 PM, David Lang <[email protected]> wrote: > On Tue, 10 Aug 2010, Igor Chudov wrote: > >> Dmitri, you are right. >> >> In any case the name change did nothing. > > did it eliminate the error from the log? does the log say anything else after > that point?
It eliminated the error from the log, but the log says the same things. What it says, in the nutshell, is that both think that "other_holds_resources". I cannot really imagine that it could possibly be such an unsolvable problem. I think that we are missing something really simple. Aug 10 14:47:18 pfs-srv3 heartbeat: [1200]: info: Link pfs-srv4:eth1 up. Aug 10 14:47:18 pfs-srv3 heartbeat: [1200]: info: Status update for node pfs-srv4: status up Aug 10 14:47:18 pfs-srv3 heartbeat: [1200]: info: Managed write_hostcachedata process 1273 exited with return code 0. Aug 10 14:47:18 pfs-srv3 harc[1272]: [1279]: info: Running /etc/ha.d//rc.d/status status Aug 10 14:47:18 pfs-srv3 heartbeat: [1200]: info: Managed status process 1272 exited with return code 0. Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: Comm_now_up(): updating status to active Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: Local status now set to: 'active' Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: Managed write_hostcachedata process 1284 exited with return code 0. Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: Status update for node pfs-srv4: status active Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: AnnounceTakeover(local 0, foreign 1, reason 'HB_R_BOTHSTARTING' (0)) Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: STATE 1 => 3 Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: STATE 3 => 2 Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: Managed write_delcachedata process 1285 exited with return code 0. Aug 10 14:47:19 pfs-srv3 harc[1286]: [1292]: info: Running /etc/ha.d//rc.d/status status Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: Managed status process 1286 exited with return code 0. Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: remote resource transition completed. Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: STATE 2 => 3 Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: other_holds_resources: 1 Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: remote resource transition completed. Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (0)) Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: Initial resource acquisition complete (T_RESOURCES(us)) Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(them)' (1)) Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: STATE 3 => 4 Aug 10 14:47:30 pfs-srv3 heartbeat: [1297]: info: 1 local resources from [/usr/share/heartbeat/ResourceManager listkeys pfs-srv3] Aug 10 14:47:30 pfs-srv3 heartbeat: [1297]: info: Local Resource acquisition completed. Aug 10 14:47:30 pfs-srv3 heartbeat: [1297]: info: FIFO message [type resource] written rc=81 Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: other_holds_resources: 1 Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: other_holds_resources: 1 Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1)) Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: Managed req_our_resources(ask) process 1297 exited with return code 0. > David Lang > >> They are still refuse to take over when rebooted simultaneously. >> >> The symptoms are the same as usual. >> >> I am thinking, should I perhaps put a little statement in >> /etc/init.d/heartbeat on one of the boxes and add "sleep 100" in it? >> >> i >> >> On Tue, Aug 10, 2010 at 2:05 PM, Dimitri Maziuk <[email protected]> >> wrote: >>> On Tuesday 10 August 2010 13:14, Igor Chudov wrote: >>>> >>>> Haresources refers to "drbddisk", however, the resource in >>>> /usr/lib/ocf/resource.d/heartbeat is called "drbd". >>> >>> Heartbeat 2.1.4 on centos 5 comes with /etc/ha.d/resource.d/drbddisk. Looks >>> like the docs you read don't match the version you have. >>> >>> Dima >>> -- >>> Dimitri Maziuk >>> Programmer/sysadmin >>> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
