Hello, again thanks for the comments. In this version of the patch I tried to fix all mentioned problems.
However there are a few things:
1. the whitespaces after the HOSTNAME should remain since this is how are the
unames exported to the RA
2. I have added the ocf_is_ms function to the .ocf-shellfuncs and so I used it
in this version of the patch
3. I use IPs for mysql replication since this way I skip the resolving. We had
a lot of problems with admins on our clusters. So this solved a bunch of
problems caused by wrong entries in resolv.conf and/or nsswitch.conf.
4. I haven't thought about the SSL/TLS connections. I have to setup one first.
I think it will be best to get the cert information from show variables. I
will add this functionality later, it seams easy to implement.
5. Keep in mind that during promotion of one MySQL server all others should
execute demote even if they are already slaves. This is because all slaves
should be reconfigured or reconnected to the master. I found that this is not
how heartbeat behaves. So what I did was to add mysql_demote to the
mysql_notify stage. This is why I got the extra checks in the demote function.
Marian
On Wednesday 24 February 2010 19:47:31 Lars Ellenberg wrote:
> On Wed, Feb 24, 2010 at 04:16:57PM +0100, Florian Haas wrote:
> > > +is_slave() {
> > > + slave_info=($(mysql \
>
> should have been a "local"...
>
> > > + --user=$OCF_RESKEY_replication_user \
> > > + --password=$OCF_RESKEY_replication_passwd \
> > > + --socket=$OCF_RESKEY_socket -O connect_timeout=1 \
> > > + -e 'SHOW SLAVE STATUS\G'|awk '/Running/ || /Master_[UHP]/{print
> > > $2}')) +
> > > + if [ "$?" != 0 ]; then
>
> And, btw, this is plain wrong.
> $? is the exit code of awk, in this case.
> ( exit 17 ) | awk '/foo/ { print $2 }' ; echo $?
> see?
>
> > > + ocf_log err "Unable to get local slave status"
> > > + return 1
> > > + fi
> > > +
> > > + if [ -z "${slave_info[*]}" ]; then
> > > + # no slave configuration, can not be slave
> > > + return 1;
> > > + fi
> > > +
> > > + if [ -z "${slave_info[3]}" ] || [ -z "${slave_info[4]}" ] || [ -z
> > > "${slave_info[0]}" ] || [ -z "${slave_info[0]}" ]; then + ocf_log
> > > err
> > > "Unable to get slave status"
> > > + return 1
> > > + fi
> >
> > As Dejan has already pointed out, arrays may not be available in
> > non-bash shells, and it's a potential regression to rely on bash
> > features in an RA that was previously Bourne shell clean. Can you come
> > of with a different way of handling this?
>
> how about
> local slave_info
> slave_info=$(mysql blafoo)
> # No, not local slave_info=$(), because then $? is the exit code
> # of declaring that variable local, which is 0 (unless your
> # shell crashes half way through...)
> if [ $? != 0 ] ...
> # in case you need to handle mysql exit code
>
> set -- $(echo "$slave_info" | awk '...' )
> # columns now in $1, $2, ...
>
> if [ -z "$3" ] ...
>
>
> did not look at the other things yet.
>
--
Best regards,
Marian Marinov
--- agents-default/heartbeat/mysql 2010-02-25 03:19:52.000000000 +0200 +++ linux-ha/mysql-replica 2010-02-25 04:37:41.000000000 +0200 @@ -11,6 +10,7 @@ # Author: Andrew Beekhof : Cleanup and import # Author: Sebastian Reitenbach : add OpenBSD defaults, more cleanup # Author: Narayan Newton : Add Gentoo/Debian defaults +# Author: Marian Marinov : Add replication capabilities # # Support: [email protected] # License: GNU General Public License (GPL) @@ -35,16 +35,23 @@ # OCF_RESKEY_log # OCF_RESKEY_pid # OCF_RESKEY_socket +# OCF_RESKEY_replication_user +# OCF_RESKEY_replication_passwd +# OCF_RESKEY_replication_port +# ####################################################################### # Initialization: : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat} . ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs +VERSION=0.18 ####################################################################### # Fill in some defaults if no values are specified HOSTOS=`uname` +HOSTNAME=`uname -n` + if [ "X${HOSTOS}" = "XOpenBSD" ];then OCF_RESKEY_binary_default="/usr/local/bin/mysqld_safe" OCF_RESKEY_config_default="/etc/my.cnf" @@ -54,10 +61,6 @@ OCF_RESKEY_group_default="_mysql" OCF_RESKEY_log_default="/var/log/mysqld.log" OCF_RESKEY_pid_default="/var/mysql/mysqld.pid" OCF_RESKEY_socket_default="/var/run/mysql/mysql.sock" -OCF_RESKEY_test_user_default="root" -OCF_RESKEY_test_table_default="mysql.user" -OCF_RESKEY_test_passwd_default="" -OCF_RESKEY_enable_creation_default=0 OCF_RESKEY_additional_parameters_default="" else OCF_RESKEY_binary_default="/usr/bin/safe_mysqld" @@ -68,12 +71,15 @@ OCF_RESKEY_group_default="mysql" OCF_RESKEY_log_default="/var/log/mysqld.log" OCF_RESKEY_pid_default="/var/run/mysql/mysqld.pid" OCF_RESKEY_socket_default="/var/lib/mysql/mysql.sock" +OCF_RESKEY_additional_parameters_default="" +fi +OCF_RESKEY_enable_creation_default=0 OCF_RESKEY_test_user_default="root" OCF_RESKEY_test_table_default="mysql.user" OCF_RESKEY_test_passwd_default="" -OCF_RESKEY_enable_creation_default=0 -OCF_RESKEY_additional_parameters_default="" -fi +OCF_RESKEY_replication_user_default="" +OCF_RESKEY_replication_passwd_default="" +OCF_RESKEY_replication_port_default="" : ${OCF_RESKEY_binary=${OCF_RESKEY_binary_default}} MYSQL_BINDIR=`dirname ${OCF_RESKEY_binary}` @@ -95,9 +100,13 @@ MYSQL_BINDIR=`dirname ${OCF_RESKEY_binar : ${OCF_RESKEY_enable_creation=${OCF_RESKEY_enable_creation_default}} : ${OCF_RESKEY_additional_parameters=${OCF_RESKEY_additional_parameters_default}} +: ${OCF_RESKEY_replication_user=${OCF_RESKEY_replication_user_default}} +: ${OCF_RESKEY_replication_passwd=${OCF_RESKEY_replication_passwd_default}} +: ${OCF_RESKEY_replication_port=${OCF_RESKEY_replication_port_default}} + usage() { cat <<UEND - usage: $0 (start|stop|validate-all|meta-data|monitor) + usage: $0 (start|stop|validate-all|meta-data|monitor|notify|promote|demote) $0 manages a MySQL Database as an HA resource. @@ -105,6 +114,9 @@ usage() { The 'stop' operation stops the database. The 'status' operation reports whether the database is running The 'monitor' operation reports whether the database seems to be working + The 'promote' operation makes this mysql server run as master + The 'demote' operation makes this mysql server run as slave + The 'notify' operation is used for post execution The 'validate-all' operation reports whether the parameters are valid UEND @@ -115,7 +127,7 @@ meta_data() { <?xml version="1.0"?> <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd"> <resource-agent name="mysql"> -<version>1.0</version> +<version>1.1</version> <longdesc lang="en"> Resource script for MySQL. @@ -230,6 +242,30 @@ Additional parameters which are passed t <content type="string" default="${OCF_RESKEY_additional_parameters_default}"/> </parameter> +<parameter name="replication_user" unique="0" required="0"> +<longdesc lang="en"> +MySQL replication user. Used for replication client and slave. +</longdesc> +<shortdesc lang="en">MySQL replication user</shortdesc> +<content type="string" default="${OCF_RESKEY_replication_user_default}" /> +</parameter> + +<parameter name="replication_passwd" unique="0" required="0"> +<longdesc lang="en"> +MySQL replication password. Used for replication client and slave. +</longdesc> +<shortdesc lang="en">MySQL replication user password</shortdesc> +<content type="string" default="${OCF_RESKEY_replication_passwd_default}" /> +</parameter> + +<parameter name="replication_port" unique="0" required="0"> +<longdesc lang="en"> +MySQL replication port. Used for replication client and slave. +</longdesc> +<shortdesc lang="en">MySQL replication user port</shortdesc> +<content type="string" default="${OCF_RESKEY_replication_port_default}" /> +</parameter> + </parameters> <actions> @@ -237,6 +273,11 @@ Additional parameters which are passed t <action name="stop" timeout="120" /> <action name="status" timeout="60" /> <action name="monitor" depth="0" timeout="30" interval="10" /> +<action name="monitor" depth="0" timeout="20" interval="20" role="Slave" /> +<action name="monitor" depth="0" timeout="20" interval="10" role="Master" /> +<action name="notify" timeout="90" /> +<action name="promote" timeout="120" /> +<action name="demote" timeout="120" /> <action name="validate-all" timeout="5" /> <action name="meta-data" timeout="5" /> </actions> @@ -272,9 +313,58 @@ mysql_validate() { ocf_log err "Group $OCF_RESKEY_group doesn't exist"; return $OCF_ERR_INSTALLED; fi + + if ocf_is_ms; then + if [ -z "$OCF_RESKEY_replication_user" ] || [ -z "$OCF_RESKEY_replication_passwd" ]; then + ocf_log err "Missing replication username or password" + exit $OCF_ERR_CONFIGURED + fi + fi true } +is_slave() { + master_host='' + master_user='' + master_port='' + slave_sql='' + slave_io='' + mysql \ + --socket=$OCF_RESKEY_socket -O connect_timeout=1 \ + -e 'SHOW SLAVE STATUS\G'|awk '/Running/ || /Master_[UHP]/{print $2}'|while read k v; do + if [ "$a" == 'Master_Host:' ]; then master_host="$v"; fi + if [ "$a" == 'Master_User:' ]; then master_user="$v"; fi + if [ "$a" == 'Master_Port:' ]; then master_port="$v"; fi + if [ "$a" == 'Slave_IO_Running:' ]; then slave_io="$v"; fi + if [ "$a" == 'Slave_SQL_Running:' ]; then slave_sql="$v"; fi + done + + if [ -z "$master_host" ] || [ -z "$master_user" ] || [ -z "$master_port" ] || [ -z "$slave_io" ] || [ -z "$slave_sql" ]; then + ocf_log err "Missing slave status value" + return 1 + fi + + if [ $# == 1 ]; then + if [ "$slave_io" == 'Yes' ] && + [ "$slave_sql" == 'Yes' ] && + [ "$master_user" == "$OCF_RESKEY_replication_user" ] && + ( grep "$master_host" /etc/hosts > /dev/null ); then + # machine is slave + return 0; + fi + else + if [ "$slave_io" == 'Yes' ] || [ "$slave_sql" == 'Yes' ]; then + if [ "$master_user" == "$OCF_RESKEY_replication_user" ] && ( grep "$master_host" /etc/hosts > /dev/null ); then + # machine is slave + return 0; + fi + fi + fi + + # machine is not slave + return 1; +} + mysql_status() { if [ ! -e $OCF_RESKEY_pid ]; then ocf_log debug "MySQL is not running" @@ -301,14 +391,24 @@ mysql_monitor() { mysql_status rc=$? - if [ $OCF_CHECK_LEVEL = 0 -o $rc != 0 ]; then + if ocf_is_probe && ! is_slave; then + # if the check came from probe + return $OCF_RUNNING_MASTER; + fi + + if [ $OCF_CHECK_LEVEL == 0 ]; then return $rc fi + if ocf_is_ms && [ "$OCF_RESKEY_CRM_meta_role" == "Slave" ] && is_slave 1; then + return $OCF_SUCCESS + fi + # Do a detailed status check buf=`echo "SELECT * FROM $OCF_RESKEY_test_table" | mysql --user=$OCF_RESKEY_test_user --password=$OCF_RESKEY_test_passwd --socket=$OCF_RESKEY_socket -O connect_timeout=1 2>&1` rc=$? - if [ ! $rc -eq 0 ]; then + + if [ $rc -ne 0 ]; then ocf_log err "MySQL $test_table monitor failed:"; if [ ! -z "$buf" ]; then ocf_log err $buf; fi return $OCF_ERR_GENERIC; @@ -435,33 +533,124 @@ mysql_stop() { return $OCF_SUCCESS } +mysql_promote() { + if ( ! mysql_status ); then + return $OCF_NOT_RUNNING + fi + if [ "$OCF_RESKEY_CRM_meta_notify_promote_uname" != "$HOSTNAME " ]; then + ocf_log err "Trying to promote machine the wrong machine($HOSTNAME)" + return $OCF_ERR_GENERIC + fi + if is_slave; then + mysql --socket=$OCF_RESKEY_socket -O connect_timeout=1 -e 'STOP SLAVE' + fi + return $OCF_SUCCESS +} + +mysql_demote() { + search_uname='' + if [[ "$OCF_RESKEY_CRM_meta_notify_master_uname" =~ "^\s*$" ]] && + [[ "$OCF_RESKEY_CRM_meta_notify_promote_uname" =~ "^\s*$" ]]; then + return $OCF_ERR_GENERIC + fi + if [ "$OCF_RESKEY_CRM_meta_notify_promote_uname" == "$HOSTNAME " ]; then + return $OCF_ERR_GENERIC + else + search_uname=$OCF_RESKEY_CRM_meta_notify_promote_uname + fi + if [ "$OCF_RESKEY_CRM_meta_notify_master_uname" == "$HOSTNAME " ]; then + return $OCF_ERR_GENERIC + else + search_uname=$OCF_RESKEY_CRM_meta_notify_master_uname + fi + + if ( ! mysql_status ); then + return $OCF_NOT_RUNNING + fi + if [ $# != 1 ] && [ "$OCF_RESKEY_CRM_meta_notify_demote_uname" != "$HOSTNAME " ]; then + return $OCF_ERR_GENERIC + fi + if is_slave 1; then + return $OCF_SUCCESS + else + if [ -z "$search_uname" ]; then return $OCF_ERR_GENERIC; fi + master_host=$(awk "/$OCF_RESKEY_CRM_meta_notify_master_uname/{print $1;exit}" /etc/hosts) + if [ -z "$master_host" ]; then + ocf_log err "Unable to get IP address of host $OCF_RESKEY_CRM_meta_notify_master_uname" + return $OCF_ERR_GENERIC; + fi + master_file='' + master_pos='' + mysql --password=$OCF_RESKEY_replication_passwd \ + --user=$OCF_RESKEY_replication_user \ + -h $master_host \ + -O connect_timeout=1 \ + -e 'SHOW MASTER STATUS\G'|while read k v; do + if [ "$k" == 'File:' ]; then master_file="$v"; fi + if [ "$k" == 'Position:' ]; then master_pos="$v"; fi + done + + if [ -z "$master_file" ] || [ -z "$master_pos" ]; then + ocf_log err "Empty master file or master position" + return $OCF_ERR_GENERIC; + fi + mysql --socket=$OCF_RESKEY_socket -O connect_timeout=1 -e 'STOP SLAVE'; + mysql --socket=$OCF_RESKEY_socket -O connect_timeout=1 \ + -e "CHANGE MASTER TO MASTER_HOST='$master_host', \ + MASTER_USER='$OCF_RESKEY_replication_user', \ + MASTER_PASSWORD='$OCF_RESKEY_replication_passwd', \ + MASTER_PORT='$OCF_RESKEY_replication_port', \ + MASTER_LOG_FILE='$master_file', \ + MASTER_LOG_POS=$master_pos}, \ + MASTER_CONNECT_RETRY=4" + mysql --socket=$OCF_RESKEY_socket -O connect_timeout=1 -e 'START SLAVE'; + fi + if is_slave 1; then + return $OCF_SUCCESS + else + return $OCF_ERR_GENERIC + fi +} + +mysql_notify() { + if [ "$OCF_RESKEY_CRM_meta_notify_type" != 'post' ]; then + return $OCF_SUCCESS + fi + case "$OCF_RESKEY_CRM_meta_notify_operation" in + 'promote') + if [ "$OCF_RESKEY_CRM_meta_notify_promote_uname" != "$HOSTNAME " ] && + ! is_slave 1; then + mysql_demote 1 + fi + ;; + 'demote') + if [ "$OCF_RESKEY_CRM_meta_notify_promote_uname" == "$HOSTNAME " ] && + ! is_slave 1; then + mysql_demote 1 + fi + ;; + *) return $OCF_SUCCESS ;; + esac + +} + case "$1" in meta-data) meta_data exit $OCF_SUCCESS;; usage|help) usage exit $OCF_SUCCESS;; + *) mysql_validate;; esac -mysql_validate -rc=$? -LSB_STATUS_STOPPED=3 -if [ $rc -ne 0 ]; then - case "$1" in - stop) exit $OCF_SUCCESS;; - monitor) exit $OCF_NOT_RUNNING;; - status) exit $LSB_STATUS_STOPPED;; - *) exit $rc;; - esac -fi - # What kind of method was invoked? case "$1" in start) mysql_start;; stop) mysql_stop;; status) mysql_status;; + promote) mysql_promote;; + demote) mysql_demote;; monitor) mysql_monitor;; - validate-all) exit $OCF_SUCCESS;; - - *) usage - exit $OCF_ERR_UNIMPLEMENTED;; + notify) mysql_notify;; + validate-all) mysql_validate;; + *) usage; exit $OCF_ERR_UNIMPLEMENTED;; esac
signature.asc
Description: This is a digitally signed message part.
_______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
