Interesting enough... It seems to work now.. Once primary comes back online it switches back to the primary machine ok. However I am using drbd for drive replication and the drives are showing out of sync. I am wondering if the failover is having trouble with the drbd switchover. Investigating more.
On 6/25/09 8:57 AM, "[email protected]" <[email protected]> wrote: > Do you get anything meaningful in the messages log from the RA? > > grep lrmd /var/log/messages > > Can you run /etc/init.d/postfix status? > > Is there anything in the /var/log/mail.info log (or wherever your > postfix logs to)? > > Finally, have you not thought about cloning postfix so you can > distribute load too? > > >> -----Original Message----- >> From: [email protected] [mailto:linux-ha- >> [email protected]] On Behalf Of David Hoskinson >> Sent: 25 June 2009 14:43 >> To: General Linux-HA mailing list >> Subject: Re: [Linux-HA] Failover problem >> >> This is so close.... Here is the scenario. Mail1 is master/preferred. > I >> did a restart on mail2 and crm_mon showed it offline and mail1 went on >> happily working correctly. After restarting mail2 they were both > happily >> up >> again. I did a restart on mail1 and the services transferred over to >> mail2 >> and worked flawlessly. Then when I restarted the preferred master, > mail1 >> I >> get this where its starts to transfer back to mail1 and fails. >> >> I can provide configs or logs if needed >> >> >> >> >> ============ >> Last updated: Thu Jun 25 08:36:21 2009 >> Stack: openais >> Current DC: mail2 - partition with quorum >> Version: 1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa >> 2 Nodes configured, 2 expected votes >> 3 Resources configured. >> ============ >> >> Online: [ mail1 mail2 ] >> >> Master/Slave Set: ms-drbd0 >> Masters: [ mail2 ] >> Stopped: [ drbd0:0 ] >> Resource Group: mail-group >> fs0 (ocf::heartbeat:Filesystem): Started mail2 >> virtual-ip (ocf::heartbeat:IPaddr2): Started mail2 >> postfix (lsb:postfix): Started mail2 (unmanaged) FAILED >> spamassassin (lsb:spamassassin): Stopped >> dovecot (lsb:dovecot): Stopped >> clamd (lsb:clamd): Stopped >> mailservices (lsb:mailservices): Stopped >> Clone Set: stonith-clone >> Started: [ mail1 mail2 ] >> >> Failed actions: >> postfix_stop_0 (node=mail2, call=40, rc=1, status=complete): > unknown >> error >> >> >> On 6/25/09 3:30 AM, "[email protected]" >> <[email protected]> wrote: >> >>> Oops sorry that's meant to be no-quorum-policy="ignore" >>> >>>> -----Original Message----- >>>> From: [email protected] [mailto:linux-ha- >>>> [email protected]] On Behalf Of > [email protected] >>>> Sent: 25 June 2009 09:22 >>>> To: [email protected] >>>> Subject: Re: [Linux-HA] Failover problem >>>> >>>> Just set up SSH STONITH until you can get something more concrete > in. >>>> You really have to use STONITH no matter what. Create an SSH > RSA/DSA >>> key >>>> without a password so you can SSH as root from one server to the > other >>>> without it asking for a password, then just: >>>> >>>> crm configure >>>>> primitive ssh-stonith stonith:ssh params hostlist="host1 host2" op >>>> monitor interval=1h >>>>> clone stonith-clone ssh-stonith >>>>> commit >>>> >>>> Good doc: >>>> http://www.clusterlabs.org/mediawiki/images/f/f2/Crm_fencing.pdf >>>> >>>> To set the quorum policy to ignore is simply: >>>> >>>> crm configure property no-quorum-policy=ignore >>>> >>>> For a 2-node cluster I generally set the following as default: >>>> >>>> no-quorum-policy="stop" \ >>>> start-failure-is-fatal="false" \ >>>> stonith-action="reboot" \ >>>> >>>>> -----Original Message----- >>>>> From: [email protected] [mailto:linux-ha- >>>>> [email protected]] On Behalf Of David Hoskinson >>>>> Sent: 24 June 2009 21:45 >>>>> To: General Linux-HA mailing list >>>>> Subject: Re: [Linux-HA] Failover problem >>>>> >>>>> Im sorry this is maybe where my knowledge is lacking. I don't > have >>>> the >>>>> hardware for a third node, but I understand your reasoning.... >>>>> >>>>> Don't understand how to add stonith and haven't found a good >>> document >>>> for >>>>> that... I also get No STONITH resources have been defined when I > do >>> a >>>>> crm_verify -LV >>>>> >>>>> Don't know how to set quorom policy to ignore. >>>>> >>>>> Which of the last 2 would you suggest, and where to look for info > on >>>> how >>>>> to >>>>> do it. >>>>> >>>>> thanks >>>>> >>>>> >>>>> On 6/24/09 3:26 PM, "Lars Ellenberg" <[email protected]> >>>> wrote: >>>>> >>>>>> On Wed, Jun 24, 2009 at 02:05:46PM -0500, David Hoskinson wrote: >>>>>>> System running 2.99 heartbeat and pacemaker 1.04. Running fine >>> in >>>>> master >>>>>>> slave mode. However if I shut down the slave server, all the >>>> services >>>>> stop >>>>>>> on the master until the slave comes back up, does the election >>> and >>>> once >>>>>>> again starts the services on the master. This doesn't seem to > be >>>> the >>>>> way it >>>>>>> should be. Same thing if I shut the master down. Services go >>> off >>>> line >>>>>>> until master is back up. >>>>>> >>>>>> Two node cluster, one vote down, >>>>>> 50% is NOT majority -> single node has no quorum. >>>>>> Quorum policy probably says: no quorum -> stop. >>>>>> You need to >>>>>> - add more nodes (just to have a real quorum), and/or >>>>>> - add stonith, and/or >>>>>> - set quorum policy to ignore. >>>>> >>>>> >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> [email protected] >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> See also: http://linux-ha.org/ReportingProblems >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> [email protected] >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >> >> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
