________________________________

From: [email protected] on behalf of Igor Chudov
Sent: Tue 8/10/2010 6:50 AM
To: General Linux-HA mailing list
Cc: [email protected]
Subject: Re: [Linux-HA] Heartbeat does not take over if BOTH 
machinesarebootedat the same time



On Mon, Aug 9, 2010 at 5:07 PM, David Lang
<[email protected]> wrote:
> ha-log should give you a detailed picture of what each box is thinking as they
> startup. I've always been able to track down the problem with that info for my
> systems.
>

David, I did a fresh restart today (without changing to mcast, yet, as
I want to do one thing at a time).

Again, neither server took over.

Here's the ha-logs from them:

http://igor.chudov.com/tmp/ha-log-1.txt
http://igor.chudov.com/tmp/ha-log-2.txt

Any ideas would be GREATLY appreciated. The old service is dying and I
feel quite a bit of pressure to get this solution to work.

Igor,

These logs seem to indicate you restarted heartbeat on pfs-srv3 and shut it 
down on pfs-srv4, are these the messages when the 2 machines were coming up?

pushkar
> David Lang
>
> On Mon, 9 Aug 2010, Igor Chudov wrote:
>
>> Pushkar, I will be at work tomorrow (took a couple of days off) and
>> will try mcast.
>>
>> This issue is a huge problem for is, as our old installation of what I
>> am trying to replace is having issues.
>>
>> I am at the end of my rope and will do everything possible to resolve it.
>>
>> What presently bothers me is that asides from some suggestions to try
>> this and that, I have no mechanism to debug this problem.
>>
>> Igor
>>
>> On Mon, Aug 9, 2010 at 12:53 PM, Pushkar Pradhan <[email protected]> 
>> wrote:
>>>
>>>
>>> ________________________________
>>>
>>> From: [email protected] on behalf of Igor Chudov
>>> Sent: Thu 8/5/2010 9:47 PM
>>> To: General Linux-HA mailing list
>>> Subject: Re: [Linux-HA] Heartbeat does not take over if BOTH machines 
>>> arebootedat the same time
>>>
>>>
>>>
>>> On Thu, Aug 5, 2010 at 6:32 PM, Pushkar Pradhan <[email protected]> 
>>> wrote:
>>>> I set up two Ubuntu Lucid machines to serve as a two-node Heartbeat
>>>> cluster without Corosync.
>>>>
>>>> They support a DRBD service, IP address, NFS and Samba services.
>>>>
>>>> Things mostly work, and if I reboot one server, the other takes over.
>>>>
>>>> What does NOT work is that if I reboot both, then *neither* takes
>>>> over. When they are in this state -- both running and none active --
>>>> if I reboot one of them, then the other begins to work.
>>>>
>>>> This is becoming a real embarrassment for me at work and I would love
>>>> to get some help.
>>>>
>>>> haresources:
>>>> pfs-srv3 drbddisk::r0 Filesystem::/dev/drbd0::/pfs::ext3 10.1.8.45/24
>>>> nfs-kernel-server smbd
>>>> pfs-srv4
>>>>
>>>> ha.cf:
>>>> use_logd on
>>>> udpport 12694
>>>> keepalive 1
>>>> warntime 15
>>>> deadtime 20
>>>> debug 1
>>>> initdead 60
>>>> bcast eth1
>>>> node pfs-srv3
>>>> node pfs-srv4
>>>> auto_failback on
>>>> crm off
>>>>
>>>>
>>>> Can you experiment with a really large initdead time like 2 or 5 minutes? 
>>>> Also see if it helps to do unicast messaging?
>>>
>>> Larger initdead does not help. I will try unicast tomorrow but I doubt
>>> it will help.
>>>
>>> Pushkar, could someone or someone else suggest some tools to trouble
>>> shoot this issue?
>>>
>>> Right now I am poking in the dark.
>>>
>>>
>>> Igor,
>>>
>>> Sorry to hear that. Any luck with unicast messaging? I am interested in 
>>> helping you, if you want we can take this discussion offline, i.e. off the 
>>> HA mailing list.
>>>
>>> pushkar
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
>>> <http://lists.linux-ha.org/mailman/listinfo/linux-ha> 
>>> See also: http://linux-ha.org/ReportingProblems 
>>> <http://linux-ha.org/ReportingProblems> 
>>>
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
>> <http://lists.linux-ha.org/mailman/listinfo/linux-ha> 
>> See also: http://linux-ha.org/ReportingProblems 
>> <http://linux-ha.org/ReportingProblems> 
>>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> <http://lists.linux-ha.org/mailman/listinfo/linux-ha> 
> See also: http://linux-ha.org/ReportingProblems 
> <http://linux-ha.org/ReportingProblems> 
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha 
<http://lists.linux-ha.org/mailman/listinfo/linux-ha> 
See also: http://linux-ha.org/ReportingProblems 
<http://linux-ha.org/ReportingProblems> 


<<winmail.dat>>

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to