Re: [Linux-HA] Master/Slave - Master node not monitored after a failure

radurad Sun, 03 Feb 2013 23:52:02 -0800

Hi,

I've installed from rpm's as it was faster (from sources I had to install a
lot in devel packages and got stuck at libcpg).
The issues is solved, master is being monitored after any numbers of
failures. But, there is a new issue I'm facing now (if I'm not able to have
it fixed I'll probably make a new post on forum - if one is not already
created-): after a couple of failures and restarts at the next failure the
mysql is not started anymore; on logs i got the message "MySql is not
running", but the start/ restart doesn't happen (made sure that failcount is
0, as I have it reseted from time to time).


Thanks again,
Radu Rad.


David Vossel wrote:
> 
> 
> 
> ----- Original Message -----
>> From: "radurad" <[email protected]>
>> To: [email protected]
>> Sent: Wednesday, January 30, 2013 5:10:00 AM
>> Subject: Re: [Linux-HA] Master/Slave - Master node not monitored after a
>> failure
>> 
>> 
>> Hi,
>> 
>> Thank you for clarifying this.
>> On CentOS 6 the latest pacemaker build is 1.1.7 (which i'm using
>> now), do
>> you see a problem if I'm installing from sources so that I'll have
>> the 1.1.8
>> pacemaker version?
> 
> The only thing I can think of is that you might have to get a new version
> of libqb in order to use 1.1.8.  We already have a rhel 6 based package
> you can use if you want.
> 
> http://clusterlabs.org/rpm-next/
> 
> -- Vossel
> 
>> Best Regards,
>> Radu Rad.
>> 
>> 
>> 
>> David Vossel wrote:
>> > 
>> > 
>> > 
>> > ----- Original Message -----
>> >> From: "radurad" <[email protected]>
>> >> To: [email protected]
>> >> Sent: Thursday, January 24, 2013 6:07:38 AM
>> >> Subject: [Linux-HA] Master/Slave - Master node not monitored after
>> >> a
>> >> failure
>> >> 
>> >> 
>> >> Hi,
>> >> 
>> >> Using following installation under CentOS
>> >> 
>> >> corosync-1.4.1-7.el6_3.1.x86_64
>> >> resource-agents-3.9.2-12.el6.x86_64
>> >> 
>> >> and having the following configuration for a Master/Slave mysql
>> >> 
>> >> primitive mysqld ocf:heartbeat:mysql \
>> >>         params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf"
>> >> socket="/var/lib/mysql/mysql.sock" datadir="/var/lib/mysql"
>> >> user="mysql"
>> >> replication_user="root" replication_passwd="testtest" \
>> >>         op monitor interval="5s" role="Slave" timeout="31s" \
>> >>         op monitor interval="6s" role="Master" timeout="30s"
>> >> ms ms_mysql mysqld \
>> >>         meta master-max="1" master-node-max="1" clone-max="2"
>> >> clone-node-max="1" notify="true"
>> >> property $id="cib-bootstrap-options" \
>> >>        
>> dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14"
>> >>         \
>> >>         cluster-infrastructure="openais" \
>> >>         expected-quorum-votes="2" \
>> >>         no-quorum-policy="ignore" \
>> >>         stonith-enabled="false" \
>> >>         last-lrm-refresh="1359026356" \
>> >>         start-failure-is-fatal="false" \
>> >>         cluster-recheck-interval="60s"
>> >> rsc_defaults $id="rsc-options" \
>> >>         failure-timeout="50s"
>> >> 
>> >> Having only one node online (the Master; with a slave online the
>> >> problem
>> >> also occurs, but for simplification I've left only the Master
>> >> online)
>> >> 
>> >> I run into the bellow problem:
>> >> - Stopping once the mysql process results in corosync restarting
>> >> the
>> >> mysql
>> >> again and promoting it to Master.
>> >> - Stopping again the mysql process results in nothing; the failure
>> >> is
>> >> not
>> >> detected, corosync takes no action and still sees the node as
>> >> Master
>> >> and the
>> >> mysql running.
>> >> - The operation monitor is not running after the first failure, as
>> >> there are
>> >> not entries in log of type:  INFO: MySQL monitor succeeded
>> >> (master).
>> >> - Changing something in configuration results in corosync
>> >> detecting
>> >> immediately that mysql is not running and promotes it. Also the
>> >> operation
>> >> monitor will run until the first failure and which the same
>> >> problem
>> >> occurs.
>> >> 
>> >> If you need more information let me know. I could attach the log
>> >> in
>> >> the
>> >> messages files also.
>> > 
>> > Hey,
>> > 
>> > This is a known bug and has been resolved in pacemaker 1.1.8.
>> > 
>> > Here's the related issue. The commits are listed in the comments.
>> > http://bugs.clusterlabs.org/show_bug.cgi?id=5072
>> > 
>> > 
>> > -- Vossel
>> > 
>> >> Thanks for now,
>> >> Radu.
>> >> 
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/Master-Slave---Master-node-not-monitored-after-a-failure-tp34939865p34939865.html
>> >> Sent from the Linux-HA mailing list archive at Nabble.com.
>> >> 
>> >> _______________________________________________
>> >> Linux-HA mailing list
>> >> [email protected]
>> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> >> See also: http://linux-ha.org/ReportingProblems
>> >> 
>> > _______________________________________________
>> > Linux-HA mailing list
>> > [email protected]
>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> > See also: http://linux-ha.org/ReportingProblems
>> > 
>> > 
>> 
>> --
>> View this message in context:
>> http://old.nabble.com/Master-Slave---Master-node-not-monitored-after-a-failure-tp34939865p34962132.html
>> Sent from the Linux-HA mailing list archive at Nabble.com.
>> 
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Master-Slave---Master-node-not-monitored-after-a-failure-tp34939865p34979148.html
Sent from the Linux-HA mailing list archive at Nabble.com.

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Master/Slave - Master node not monitored after a failure

Reply via email to