________________________________
From: [email protected] on behalf of Igor Chudov Sent: Tue 8/10/2010 6:50 AM To: General Linux-HA mailing list Cc: [email protected] Subject: Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time On Mon, Aug 9, 2010 at 5:07 PM, David Lang <[email protected]> wrote: > ha-log should give you a detailed picture of what each box is thinking as they > startup. I've always been able to track down the problem with that info for my > systems. > David, I did a fresh restart today (without changing to mcast, yet, as I want to do one thing at a time). Again, neither server took over. Here's the ha-logs from them: http://igor.chudov.com/tmp/ha-log-1.txt http://igor.chudov.com/tmp/ha-log-2.txt Any ideas would be GREATLY appreciated. The old service is dying and I feel quite a bit of pressure to get this solution to work. Igor, These logs seem to indicate you restarted heartbeat on pfs-srv3 and shut it down on pfs-srv4, are these the messages when the 2 machines were coming up? pushkar > David Lang > > On Mon, 9 Aug 2010, Igor Chudov wrote: > >> Pushkar, I will be at work tomorrow (took a couple of days off) and >> will try mcast. >> >> This issue is a huge problem for is, as our old installation of what I >> am trying to replace is having issues. >> >> I am at the end of my rope and will do everything possible to resolve it. >> >> What presently bothers me is that asides from some suggestions to try >> this and that, I have no mechanism to debug this problem. >> >> Igor >> >> On Mon, Aug 9, 2010 at 12:53 PM, Pushkar Pradhan <[email protected]> >> wrote: >>> >>> >>> ________________________________ >>> >>> From: [email protected] on behalf of Igor Chudov >>> Sent: Thu 8/5/2010 9:47 PM >>> To: General Linux-HA mailing list >>> Subject: Re: [Linux-HA] Heartbeat does not take over if BOTH machines >>> arebootedat the same time >>> >>> >>> >>> On Thu, Aug 5, 2010 at 6:32 PM, Pushkar Pradhan <[email protected]> >>> wrote: >>>> I set up two Ubuntu Lucid machines to serve as a two-node Heartbeat >>>> cluster without Corosync. >>>> >>>> They support a DRBD service, IP address, NFS and Samba services. >>>> >>>> Things mostly work, and if I reboot one server, the other takes over. >>>> >>>> What does NOT work is that if I reboot both, then *neither* takes >>>> over. When they are in this state -- both running and none active -- >>>> if I reboot one of them, then the other begins to work. >>>> >>>> This is becoming a real embarrassment for me at work and I would love >>>> to get some help. >>>> >>>> haresources: >>>> pfs-srv3 drbddisk::r0 Filesystem::/dev/drbd0::/pfs::ext3 10.1.8.45/24 >>>> nfs-kernel-server smbd >>>> pfs-srv4 >>>> >>>> ha.cf: >>>> use_logd on >>>> udpport 12694 >>>> keepalive 1 >>>> warntime 15 >>>> deadtime 20 >>>> debug 1 >>>> initdead 60 >>>> bcast eth1 >>>> node pfs-srv3 >>>> node pfs-srv4 >>>> auto_failback on >>>> crm off >>>> >>>> >>>> Can you experiment with a really large initdead time like 2 or 5 minutes? >>>> Also see if it helps to do unicast messaging? >>> >>> Larger initdead does not help. I will try unicast tomorrow but I doubt >>> it will help. >>> >>> Pushkar, could someone or someone else suggest some tools to trouble >>> shoot this issue? >>> >>> Right now I am poking in the dark. >>> >>> >>> Igor, >>> >>> Sorry to hear that. Any luck with unicast messaging? I am interested in >>> helping you, if you want we can take this discussion offline, i.e. off the >>> HA mailing list. >>> >>> pushkar >>> >>> >>> >>> >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> <http://lists.linux-ha.org/mailman/listinfo/linux-ha> >>> See also: http://linux-ha.org/ReportingProblems >>> <http://linux-ha.org/ReportingProblems> >>> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> <http://lists.linux-ha.org/mailman/listinfo/linux-ha> >> See also: http://linux-ha.org/ReportingProblems >> <http://linux-ha.org/ReportingProblems> >> > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > <http://lists.linux-ha.org/mailman/listinfo/linux-ha> > See also: http://linux-ha.org/ReportingProblems > <http://linux-ha.org/ReportingProblems> > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha <http://lists.linux-ha.org/mailman/listinfo/linux-ha> See also: http://linux-ha.org/ReportingProblems <http://linux-ha.org/ReportingProblems>
<<winmail.dat>>
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
