Do you get anything meaningful in the messages log from the RA? grep lrmd /var/log/messages
Can you run /etc/init.d/postfix status? Is there anything in the /var/log/mail.info log (or wherever your postfix logs to)? Finally, have you not thought about cloning postfix so you can distribute load too? > -----Original Message----- > From: [email protected] [mailto:linux-ha- > [email protected]] On Behalf Of David Hoskinson > Sent: 25 June 2009 14:43 > To: General Linux-HA mailing list > Subject: Re: [Linux-HA] Failover problem > > This is so close.... Here is the scenario. Mail1 is master/preferred. I > did a restart on mail2 and crm_mon showed it offline and mail1 went on > happily working correctly. After restarting mail2 they were both happily > up > again. I did a restart on mail1 and the services transferred over to > mail2 > and worked flawlessly. Then when I restarted the preferred master, mail1 > I > get this where its starts to transfer back to mail1 and fails. > > I can provide configs or logs if needed > > > > > ============ > Last updated: Thu Jun 25 08:36:21 2009 > Stack: openais > Current DC: mail2 - partition with quorum > Version: 1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa > 2 Nodes configured, 2 expected votes > 3 Resources configured. > ============ > > Online: [ mail1 mail2 ] > > Master/Slave Set: ms-drbd0 > Masters: [ mail2 ] > Stopped: [ drbd0:0 ] > Resource Group: mail-group > fs0 (ocf::heartbeat:Filesystem): Started mail2 > virtual-ip (ocf::heartbeat:IPaddr2): Started mail2 > postfix (lsb:postfix): Started mail2 (unmanaged) FAILED > spamassassin (lsb:spamassassin): Stopped > dovecot (lsb:dovecot): Stopped > clamd (lsb:clamd): Stopped > mailservices (lsb:mailservices): Stopped > Clone Set: stonith-clone > Started: [ mail1 mail2 ] > > Failed actions: > postfix_stop_0 (node=mail2, call=40, rc=1, status=complete): unknown > error > > > On 6/25/09 3:30 AM, "[email protected]" > <[email protected]> wrote: > > > Oops sorry that's meant to be no-quorum-policy="ignore" > > > >> -----Original Message----- > >> From: [email protected] [mailto:linux-ha- > >> [email protected]] On Behalf Of [email protected] > >> Sent: 25 June 2009 09:22 > >> To: [email protected] > >> Subject: Re: [Linux-HA] Failover problem > >> > >> Just set up SSH STONITH until you can get something more concrete in. > >> You really have to use STONITH no matter what. Create an SSH RSA/DSA > > key > >> without a password so you can SSH as root from one server to the other > >> without it asking for a password, then just: > >> > >> crm configure > >>> primitive ssh-stonith stonith:ssh params hostlist="host1 host2" op > >> monitor interval=1h > >>> clone stonith-clone ssh-stonith > >>> commit > >> > >> Good doc: > >> http://www.clusterlabs.org/mediawiki/images/f/f2/Crm_fencing.pdf > >> > >> To set the quorum policy to ignore is simply: > >> > >> crm configure property no-quorum-policy=ignore > >> > >> For a 2-node cluster I generally set the following as default: > >> > >> no-quorum-policy="stop" \ > >> start-failure-is-fatal="false" \ > >> stonith-action="reboot" \ > >> > >>> -----Original Message----- > >>> From: [email protected] [mailto:linux-ha- > >>> [email protected]] On Behalf Of David Hoskinson > >>> Sent: 24 June 2009 21:45 > >>> To: General Linux-HA mailing list > >>> Subject: Re: [Linux-HA] Failover problem > >>> > >>> Im sorry this is maybe where my knowledge is lacking. I don't have > >> the > >>> hardware for a third node, but I understand your reasoning.... > >>> > >>> Don't understand how to add stonith and haven't found a good > > document > >> for > >>> that... I also get No STONITH resources have been defined when I do > > a > >>> crm_verify -LV > >>> > >>> Don't know how to set quorom policy to ignore. > >>> > >>> Which of the last 2 would you suggest, and where to look for info on > >> how > >>> to > >>> do it. > >>> > >>> thanks > >>> > >>> > >>> On 6/24/09 3:26 PM, "Lars Ellenberg" <[email protected]> > >> wrote: > >>> > >>>> On Wed, Jun 24, 2009 at 02:05:46PM -0500, David Hoskinson wrote: > >>>>> System running 2.99 heartbeat and pacemaker 1.04. Running fine > > in > >>> master > >>>>> slave mode. However if I shut down the slave server, all the > >> services > >>> stop > >>>>> on the master until the slave comes back up, does the election > > and > >> once > >>>>> again starts the services on the master. This doesn't seem to be > >> the > >>> way it > >>>>> should be. Same thing if I shut the master down. Services go > > off > >> line > >>>>> until master is back up. > >>>> > >>>> Two node cluster, one vote down, > >>>> 50% is NOT majority -> single node has no quorum. > >>>> Quorum policy probably says: no quorum -> stop. > >>>> You need to > >>>> - add more nodes (just to have a real quorum), and/or > >>>> - add stonith, and/or > >>>> - set quorum policy to ignore. > >>> > >>> > >>> _______________________________________________ > >>> Linux-HA mailing list > >>> [email protected] > >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>> See also: http://linux-ha.org/ReportingProblems > >> _______________________________________________ > >> Linux-HA mailing list > >> [email protected] > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
