Re: [Linux-HA] Failover problem

David Hoskinson Thu, 25 Jun 2009 10:50:07 -0700

Interesting enough... It seems to work now.. Once primary comes back online
it switches back to the primary machine ok.  However I am using drbd for
drive replication and the drives are showing out of sync.  I am wondering if
the failover is having trouble with the drbd switchover.  Investigating
more.



On 6/25/09 8:57 AM, "[email protected]"
<[email protected]> wrote:

> Do you get anything meaningful in the messages log from the RA?
> 
> grep lrmd /var/log/messages
> 
> Can you run /etc/init.d/postfix status?
> 
> Is there anything in the /var/log/mail.info log (or wherever your
> postfix logs to)?
> 
> Finally, have you not thought about cloning postfix so you can
> distribute load too?
> 
> 
>> -----Original Message-----
>> From: [email protected] [mailto:linux-ha-
>> [email protected]] On Behalf Of David Hoskinson
>> Sent: 25 June 2009 14:43
>> To: General Linux-HA mailing list
>> Subject: Re: [Linux-HA] Failover problem
>> 
>> This is so close.... Here is the scenario.  Mail1 is master/preferred.
> I
>> did a restart on mail2 and crm_mon showed it offline and mail1 went on
>> happily working correctly.  After restarting mail2 they were both
> happily
>> up
>> again.  I did a restart on mail1 and the services transferred over to
>> mail2
>> and worked flawlessly.  Then when I restarted the preferred master,
> mail1
>> I
>> get this where its starts to transfer back to mail1 and fails.
>> 
>> I can provide configs or logs if needed
>> 
>> 
>> 
>> 
>> ============
>> Last updated: Thu Jun 25 08:36:21 2009
>> Stack: openais
>> Current DC: mail2    - partition with quorum
>> Version: 1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa
>> 2 Nodes configured, 2 expected votes
>> 3 Resources configured.
>> ============
>> 
>> Online: [ mail1 mail2 ]
>> 
>> Master/Slave Set: ms-drbd0
>>         Masters: [ mail2 ]
>>         Stopped: [ drbd0:0 ]
>> Resource Group: mail-group
>>     fs0 (ocf::heartbeat:Filesystem):    Started mail2
>>     virtual-ip  (ocf::heartbeat:IPaddr2):    Started mail2
>>     postfix     (lsb:postfix):  Started mail2 (unmanaged) FAILED
>>     spamassassin        (lsb:spamassassin):     Stopped
>>     dovecot     (lsb:dovecot):  Stopped
>>     clamd    (lsb:clamd):    Stopped
>>     mailservices        (lsb:mailservices):     Stopped
>> Clone Set: stonith-clone
>>         Started: [ mail1 mail2 ]
>> 
>> Failed actions:
>>     postfix_stop_0 (node=mail2, call=40, rc=1, status=complete):
> unknown
>> error
>> 
>> 
>> On 6/25/09 3:30 AM, "[email protected]"
>> <[email protected]> wrote:
>> 
>>> Oops sorry that's meant to be no-quorum-policy="ignore"
>>> 
>>>> -----Original Message-----
>>>> From: [email protected] [mailto:linux-ha-
>>>> [email protected]] On Behalf Of
> [email protected]
>>>> Sent: 25 June 2009 09:22
>>>> To: [email protected]
>>>> Subject: Re: [Linux-HA] Failover problem
>>>> 
>>>> Just set up SSH STONITH until you can get something more concrete
> in.
>>>> You really have to use STONITH no matter what. Create an SSH
> RSA/DSA
>>> key
>>>> without a password so you can SSH as root from one server to the
> other
>>>> without it asking for a password, then just:
>>>> 
>>>> crm configure
>>>>> primitive ssh-stonith stonith:ssh params hostlist="host1 host2" op
>>>> monitor interval=1h
>>>>> clone stonith-clone ssh-stonith
>>>>> commit
>>>> 
>>>> Good doc:
>>>> http://www.clusterlabs.org/mediawiki/images/f/f2/Crm_fencing.pdf
>>>> 
>>>> To set the quorum policy to ignore is simply:
>>>> 
>>>> crm configure property no-quorum-policy=ignore
>>>> 
>>>> For a 2-node cluster I generally set the following as default:
>>>> 
>>>> no-quorum-policy="stop" \
>>>>         start-failure-is-fatal="false" \
>>>>         stonith-action="reboot" \
>>>> 
>>>>> -----Original Message-----
>>>>> From: [email protected] [mailto:linux-ha-
>>>>> [email protected]] On Behalf Of David Hoskinson
>>>>> Sent: 24 June 2009 21:45
>>>>> To: General Linux-HA mailing list
>>>>> Subject: Re: [Linux-HA] Failover problem
>>>>> 
>>>>> Im sorry this is maybe where my knowledge is lacking.  I don't
> have
>>>> the
>>>>> hardware for a third node, but I understand your reasoning....
>>>>> 
>>>>> Don't understand how to add stonith and haven't found a good
>>> document
>>>> for
>>>>> that... I also get No STONITH resources have been defined when I
> do
>>> a
>>>>> crm_verify -LV
>>>>> 
>>>>> Don't know how to set quorom policy to ignore.
>>>>> 
>>>>> Which of the last 2 would you suggest, and where to look for info
> on
>>>> how
>>>>> to
>>>>> do it.
>>>>> 
>>>>> thanks
>>>>> 
>>>>> 
>>>>> On 6/24/09 3:26 PM, "Lars Ellenberg" <[email protected]>
>>>> wrote:
>>>>> 
>>>>>> On Wed, Jun 24, 2009 at 02:05:46PM -0500, David Hoskinson wrote:
>>>>>>> System running 2.99 heartbeat and pacemaker 1.04.  Running fine
>>> in
>>>>> master
>>>>>>> slave mode.  However if I shut down the slave server, all the
>>>> services
>>>>> stop
>>>>>>> on the master until the slave comes back up, does the election
>>> and
>>>> once
>>>>>>> again starts the services on the master.  This doesn't seem to
> be
>>>> the
>>>>> way it
>>>>>>> should be.  Same thing if I shut the master down.  Services go
>>> off
>>>> line
>>>>>>> until master is back up.
>>>>>> 
>>>>>> Two node cluster, one vote down,
>>>>>> 50% is NOT majority -> single node has no quorum.
>>>>>> Quorum policy probably says: no quorum -> stop.
>>>>>> You need to
>>>>>>  - add more nodes (just to have a real quorum), and/or
>>>>>>  - add stonith, and/or
>>>>>>  - set quorum policy to ignore.
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Linux-HA mailing list
>>>>> [email protected]
>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>> See also: http://linux-ha.org/ReportingProblems
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> [email protected]
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>> _______________________________________________
>>> Linux-HA mailing list
>>> [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>> 
>> 
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Failover problem

Reply via email to