Ok, just checking again, the two haresources files are truely identical.

you didn't put different system names in the first line of each file or 
something like that? (this is a common mistake)

I would also remove the second host from the haresources file. having it there 
with no resources on it may get detected as a special case and ignored, but 
it's 
doing you no good and there's some possibility of it confusing the system, so 
try removing it.

David Lang


On Tue, 10 Aug 2010, Igor Chudov wrote:

> Date: Tue, 10 Aug 2010 14:55:25 -0500
> From: Igor Chudov <[email protected]>
> Reply-To: General Linux-HA mailing list <[email protected]>
> To: General Linux-HA mailing list <[email protected]>
> Subject: Re: [Linux-HA] Heartbeat does not take over if BOTH
>     machinesarebootedat the same time
> 
> On Tue, Aug 10, 2010 at 2:28 PM, David Lang
> <[email protected]> wrote:
>> On Tue, 10 Aug 2010, Igor Chudov wrote:
>>
>>> Dmitri, you are right.
>>>
>>> In any case the name change did nothing.
>>
>> did it eliminate the error from the log? does the log say anything else after
>> that point?
>
> It eliminated the error from the log, but the log says the same things.
>
> What it says, in the nutshell, is that both think that 
> "other_holds_resources".
>
> I cannot really imagine that it could possibly be such an unsolvable
> problem. I think that we are missing something really simple.
>
> Aug 10 14:47:18 pfs-srv3 heartbeat: [1200]: info: Link pfs-srv4:eth1 up.
> Aug 10 14:47:18 pfs-srv3 heartbeat: [1200]: info: Status update for
> node pfs-srv4: status up
> Aug 10 14:47:18 pfs-srv3 heartbeat: [1200]: info: Managed
> write_hostcachedata process 1273 exited with return code 0.
> Aug 10 14:47:18 pfs-srv3 harc[1272]: [1279]: info: Running
> /etc/ha.d//rc.d/status status
> Aug 10 14:47:18 pfs-srv3 heartbeat: [1200]: info: Managed status
> process 1272 exited with return code 0.
> Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: Comm_now_up():
> updating status to active
> Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: Local status now set
> to: 'active'
> Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: Managed
> write_hostcachedata process 1284 exited with return code 0.
> Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: Status update for
> node pfs-srv4: status active
> Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info:
> AnnounceTakeover(local 0, foreign 1, reason 'HB_R_BOTHSTARTING' (0))
> Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: STATE 1 => 3
> Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: STATE 3 => 2
> Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: Managed
> write_delcachedata process 1285 exited with return code 0.
> Aug 10 14:47:19 pfs-srv3 harc[1286]: [1292]: info: Running
> /etc/ha.d//rc.d/status status
> Aug 10 14:47:19 pfs-srv3 heartbeat: [1200]: info: Managed status
> process 1286 exited with return code 0.
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: remote resource
> transition completed.
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: STATE 2 => 3
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: other_holds_resources: 1
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: remote resource
> transition completed.
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info:
> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (0))
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: Initial resource
> acquisition complete (T_RESOURCES(us))
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info:
> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(them)' (1))
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: STATE 3 => 4
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1297]: info: 1 local resources
> from [/usr/share/heartbeat/ResourceManager listkeys pfs-srv3]
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1297]: info: Local Resource
> acquisition completed.
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1297]: info: FIFO message [type
> resource] written rc=81
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: other_holds_resources: 1
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: other_holds_resources: 1
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info:
> AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
> Aug 10 14:47:30 pfs-srv3 heartbeat: [1200]: info: Managed
> req_our_resources(ask) process 1297 exited with return code 0.
>
>> David Lang
>>
>>> They are still refuse to take over when rebooted simultaneously.
>>>
>>> The symptoms are the same as usual.
>>>
>>> I am thinking, should I perhaps put a little statement in
>>> /etc/init.d/heartbeat on one of the boxes and add "sleep 100" in it?
>>>
>>> i
>>>
>>> On Tue, Aug 10, 2010 at 2:05 PM, Dimitri Maziuk <[email protected]> 
>>> wrote:
>>>> On Tuesday 10 August 2010 13:14, Igor Chudov wrote:
>>>>>
>>>>> Haresources refers to "drbddisk", however, the resource in
>>>>> /usr/lib/ocf/resource.d/heartbeat is called "drbd".
>>>>
>>>> Heartbeat 2.1.4 on centos 5 comes with /etc/ha.d/resource.d/drbddisk. Looks
>>>> like the docs you read don't match the version you have.
>>>>
>>>> Dima
>>>> --
>>>> Dimitri Maziuk
>>>> Programmer/sysadmin
>>>> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> [email protected]
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to