Re: [Linux-HA] Failover problem

Darren.Mansell Thu, 25 Jun 2009 06:57:40 -0700

Do you get anything meaningful in the messages log from the RA?

grep lrmd /var/log/messages


Can you run /etc/init.d/postfix status?

Is there anything in the /var/log/mail.info log (or wherever your
postfix logs to)?

Finally, have you not thought about cloning postfix so you can
distribute load too?


> -----Original Message-----
> From: [email protected] [mailto:linux-ha-
> [email protected]] On Behalf Of David Hoskinson
> Sent: 25 June 2009 14:43
> To: General Linux-HA mailing list
> Subject: Re: [Linux-HA] Failover problem
> 
> This is so close.... Here is the scenario.  Mail1 is master/preferred.
I
> did a restart on mail2 and crm_mon showed it offline and mail1 went on
> happily working correctly.  After restarting mail2 they were both
happily
> up
> again.  I did a restart on mail1 and the services transferred over to
> mail2
> and worked flawlessly.  Then when I restarted the preferred master,
mail1
> I
> get this where its starts to transfer back to mail1 and fails.
> 
> I can provide configs or logs if needed
> 
> 
> 
> 
> ============
> Last updated: Thu Jun 25 08:36:21 2009
> Stack: openais
> Current DC: mail2    - partition with quorum
> Version: 1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa
> 2 Nodes configured, 2 expected votes
> 3 Resources configured.
> ============
> 
> Online: [ mail1 mail2 ]
> 
> Master/Slave Set: ms-drbd0
>         Masters: [ mail2 ]
>         Stopped: [ drbd0:0 ]
> Resource Group: mail-group
>     fs0 (ocf::heartbeat:Filesystem):    Started mail2
>     virtual-ip  (ocf::heartbeat:IPaddr2):    Started mail2
>     postfix     (lsb:postfix):  Started mail2 (unmanaged) FAILED
>     spamassassin        (lsb:spamassassin):     Stopped
>     dovecot     (lsb:dovecot):  Stopped
>     clamd    (lsb:clamd):    Stopped
>     mailservices        (lsb:mailservices):     Stopped
> Clone Set: stonith-clone
>         Started: [ mail1 mail2 ]
> 
> Failed actions:
>     postfix_stop_0 (node=mail2, call=40, rc=1, status=complete):
unknown
> error
> 
> 
> On 6/25/09 3:30 AM, "[email protected]"
> <[email protected]> wrote:
> 
> > Oops sorry that's meant to be no-quorum-policy="ignore"
> >
> >> -----Original Message-----
> >> From: [email protected] [mailto:linux-ha-
> >> [email protected]] On Behalf Of
[email protected]
> >> Sent: 25 June 2009 09:22
> >> To: [email protected]
> >> Subject: Re: [Linux-HA] Failover problem
> >>
> >> Just set up SSH STONITH until you can get something more concrete
in.
> >> You really have to use STONITH no matter what. Create an SSH
RSA/DSA
> > key
> >> without a password so you can SSH as root from one server to the
other
> >> without it asking for a password, then just:
> >>
> >> crm configure
> >>> primitive ssh-stonith stonith:ssh params hostlist="host1 host2" op
> >> monitor interval=1h
> >>> clone stonith-clone ssh-stonith
> >>> commit
> >>
> >> Good doc:
> >> http://www.clusterlabs.org/mediawiki/images/f/f2/Crm_fencing.pdf
> >>
> >> To set the quorum policy to ignore is simply:
> >>
> >> crm configure property no-quorum-policy=ignore
> >>
> >> For a 2-node cluster I generally set the following as default:
> >>
> >> no-quorum-policy="stop" \
> >>         start-failure-is-fatal="false" \
> >>         stonith-action="reboot" \
> >>
> >>> -----Original Message-----
> >>> From: [email protected] [mailto:linux-ha-
> >>> [email protected]] On Behalf Of David Hoskinson
> >>> Sent: 24 June 2009 21:45
> >>> To: General Linux-HA mailing list
> >>> Subject: Re: [Linux-HA] Failover problem
> >>>
> >>> Im sorry this is maybe where my knowledge is lacking.  I don't
have
> >> the
> >>> hardware for a third node, but I understand your reasoning....
> >>>
> >>> Don't understand how to add stonith and haven't found a good
> > document
> >> for
> >>> that... I also get No STONITH resources have been defined when I
do
> > a
> >>> crm_verify -LV
> >>>
> >>> Don't know how to set quorom policy to ignore.
> >>>
> >>> Which of the last 2 would you suggest, and where to look for info
on
> >> how
> >>> to
> >>> do it.
> >>>
> >>> thanks
> >>>
> >>>
> >>> On 6/24/09 3:26 PM, "Lars Ellenberg" <[email protected]>
> >> wrote:
> >>>
> >>>> On Wed, Jun 24, 2009 at 02:05:46PM -0500, David Hoskinson wrote:
> >>>>> System running 2.99 heartbeat and pacemaker 1.04.  Running fine
> > in
> >>> master
> >>>>> slave mode.  However if I shut down the slave server, all the
> >> services
> >>> stop
> >>>>> on the master until the slave comes back up, does the election
> > and
> >> once
> >>>>> again starts the services on the master.  This doesn't seem to
be
> >> the
> >>> way it
> >>>>> should be.  Same thing if I shut the master down.  Services go
> > off
> >> line
> >>>>> until master is back up.
> >>>>
> >>>> Two node cluster, one vote down,
> >>>> 50% is NOT majority -> single node has no quorum.
> >>>> Quorum policy probably says: no quorum -> stop.
> >>>> You need to
> >>>>  - add more nodes (just to have a real quorum), and/or
> >>>>  - add stonith, and/or
> >>>>  - set quorum policy to ignore.
> >>>
> >>>
> >>> _______________________________________________
> >>> Linux-HA mailing list
> >>> [email protected]
> >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>> See also: http://linux-ha.org/ReportingProblems
> >> _______________________________________________
> >> Linux-HA mailing list
> >> [email protected]
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> 
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Failover problem

Reply via email to