On Tuesday 01 February 2011 17:27:31 [email protected] wrote:
> Hi There,
> 
> 
> 
> I have set up a pair of ha LAMP servers using heartbeat, pacemaker and drbd
> on Ubuntu 10.04 LTS. Everything works fine until I upgraded mysql-server
> from 5.1.41-3ubuntu12.6 to 5.1.41-3ubuntu12.9. Now node 1
> 
> (arsvr1) works still fine, but mysql on node 2 (arsvr2) won't start when I
> switch arsvr1 standby. The error message shown from "crm status" is
> 
> 
> 
> Failed actions:
> 
> mysql_start_0 (node=arsvr2, call=32, rc=4, status=complete):
> 
> insufficient privileges
> 
> 
> 
> No errors logged in /var/log/mysql/error.log at all.
> 
> 
> 
> drbd mysql partition mounted properly. If I go to
> /usr/lib/ocf/resource.d/heartbeat and set the OCF_RESKEY parameters, I
> have no problem to start mysql server by "./mysql start". But the resource
> mysql won't show up in crm status.
> 
> 
> 
> So looks somehow pacemaker fail to start resource mysql even before running
> the resource script.
> 
> 
> 
> By comparing the logs from the two nodes, the real different is after the
> line
> 
> 
> 
> info: process_lrm_event: LRM operation fs_mysql_start_0
> 
> 
> 
> On node arsvr1, after that line we got a confirmation on Action
> fs_mysql_start_0 as such
> 
> 
> 
> info: match_graph_event: Action fs_mysql_start_0 (8) confirmed on arsvr1
> 
> 
> 
> and then went on to Initiating action 9: start mysql_start_0 on arsvr1
> (local).
> 
> 
> 
> However on node arsvr2, we never see the confirmation from Action
> fs_mysql_start_0. So mysql_start_0 is never called. But the strange thing
> is, I can see the drbd partition of fs_mysql is properly mounted on
> arsvr2. Anyone knows what might stop arsvr2 to run that Action
> fs_mysql_start_0 (8) confirmed?
> 
> 
> 
> 
> 
> Here are the cluster logs from the two nodes.
> 
> 
> 
> Logs on Node 2:
> 
> 
> 
> Jan 28 14:24:23 arsvr2 lrmd: [919]: info: rsc:fs_mysql:229: start Jan 28
> 14:24:23 arsvr2 Filesystem[1568]: [1596]: INFO: Running start for
> /dev/drbd/by-res/r0 on /var/lib/mysql Jan 28 14:24:23 arsvr2 lrmd: [919]:
> info: RA output: (fs_mysql:start:stderr) FATAL: Module scsi_hostadapter
> not found.
> 
> Jan 28 14:24:23 arsvr2 Filesystem[1568]: [1606]: INFO: Starting filesystem
> check on /dev/drbd/by-res/r0 Jan 28 14:24:23 arsvr2 lrmd: [919]: info: RA
> output: (fs_mysql:start:stdout) fsck from util-linux-ng 2.17.2 Jan 28
> 14:24:23 arsvr2 lrmd: [919]: info: RA output:(fs_mysql:start:stdout)
> /dev/drbd0: clean, 178/3276800 files, 257999/13106791 blocks Jan 28
> 14:24:23 arsvr2 crmd: [922]: info: process_lrm_event: LRM operation
> fs_mysql_start_0 (call=229, rc=0, cib-update=251,confirmed=true) ok Jan 28
> 14:24:46 arsvr2 cib: [918]: info: cib_stats: Processed 149 operations
> (0.00us average, 0% utilization) in the last 10min
> 
> 
> 
> Logs on Node 1:
> 
> 
> 
> Jan 28 14:28:58 arsvr1 lrmd: [1065]: info: rsc:fs_mysql:867: start Jan 28
> 14:28:58 arsvr1 crmd: [1068]: info: te_rsc_command: Initiating action 31:
> monitor drbd_mysql:1_monitor_15000 on arsvr2 Jan 28 14:28:58 arsvr1
> Filesystem[516]: [544]: INFO: Running start for /dev/drbd/by-res/r0 on
> /var/lib/mysql Jan 28 14:28:58 arsvr1 lrmd: [1065]: info: RA
> output:(fs_mysql:start:stderr) FATAL: Module scsi_hostadapter not found.
> 
> Jan 28 14:28:58 arsvr1 Filesystem[516]: [554]: INFO: Starting filesystem
> check on /dev/drbd/by-res/r0 Jan 28 14:28:58 arsvr1 lrmd: [1065]: info: RA
> output:(fs_mysql:start:stdout) fsck from util-linux-ng 2.17.2 Jan 28
> 14:28:58 arsvr1 lrmd: [1065]: info: RA output:(fs_mysql:start:stdout)
> /dev/drbd0: clean, 178/3276800 files,257999/13106791 blocks Jan 28
> 14:28:58 arsvr1 crmd: [1068]: info: process_lrm_event: LRM operation
> fs_mysql_start_0 (call=867, rc=0, cib-update=1650,confirmed=true) ok Jan
> 28 14:28:58 arsvr1 crmd: [1068]: info: match_graph_event: Action
> fs_mysql_start_0 (8) confirmed on arsvr1 (rc=0) Jan 28 14:28:58 arsvr1
> crmd: [1068]: info: te_rsc_command: Initiating action 9: start
> mysql_start_0 on arsvr1 (local) Jan 28 14:28:58 arsvr1 crmd: [1068]: info:
> do_lrm_rsc_op: Performing key=9:551:0:9c402121-906c-42de-a18a-68deb24208cb
> op=mysql_start_0 ) Jan 28 14:28:58 arsvr1 lrmd: [1065]: info:
> rsc:mysql:868: start Jan 28 14:28:58 arsvr1 mysqld_safe: Starting mysqld
> daemon with databases from /var/lib/mysql Jan 28 14:28:59 arsvr1 crmd:
> [1068]: info: match_graph_event: Action drbd_mysql:1_monitor_15000 (31)
> confirmed on arsvr2 (rc=0) Jan 28 14:29:02 arsvr1 mysql[576]: [728]: INFO:
> MySQL started Jan 28 14:29:02 arsvr1 crmd: [1068]: info:
> process_lrm_event: LRM operation mysql_start_0 (call=868, rc=0,
> cib-update=1651,confirmed=true) ok Jan 28 14:29:02 arsvr1 crmd: [1068]:
> info: match_graph_event: Action mysql_start_0 (9) confirmed on arsvr1
> (rc=0)
> 
> 
> 
> 
> 
> 
> 
> And here is the configuration
> 
> 
> 
> node $id="bc6bf61d-6b5f-4307-85f3-bf7bb11531bb" arsvr2 \
> 
>       attributes standby="off"
> 
> node $id="bf0e7394-9684-42b9-893b-5a9a6ecddd7e" arsvr1 \
> 
>       attributes standby="off"
> 
> primitive apache2 lsb:apache2 \
> 
>       op start interval="0" timeout="60" \
> 
>       op stop interval="0" timeout="120" start-delay="15" \
> 
>       meta target-role="Started"
> 
> primitive drbd_mysql ocf:linbit:drbd \
> 
>       params drbd_resource="r0" \
> 
>       op monitor interval="15s"
> 
> primitive drbd_webfs ocf:linbit:drbd \
> 
>       params drbd_resource="r1" \
> 
>       op monitor interval="15s" \
> 
>       op start interval="0" timeout="240" \
> 
>       op stop interval="0" timeout="100"
> 
> primitive fs_mysql ocf:heartbeat:Filesystem \
> 
>       params device="/dev/drbd/by-res/r0" directory="/var/lib/mysql"
> fstype="ext4" \
> 
>       op start interval="0" timeout="60" \
> 
>       op stop interval="0" timeout="120" \
> 
>       meta target-role="Started"
> 
> primitive fs_webfs ocf:heartbeat:Filesystem \
> 
>       params device="/dev/drbd/by-res/r1" directory="/srv" fstype="ext4" \
> 
>       op start interval="0" timeout="60" \
> 
>       op stop interval="0" timeout="120" \
> 
>       meta target-role="Started"
> 
> primitive ip1 ocf:heartbeat:IPaddr2 \
> 
>       params ip="10.10.10.193" nic="eth0" \
> 
>       op monitor interval="5s"
> 
> primitive ip1arp ocf:heartbeat:SendArp \
> 
>       params ip="10.10.10.193" nic="eth0"
> 
> primitive mysql ocf:heartbeat:mysql \
> 
>       params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf"
> 
> user="mysql" group="mysql" log="/var/log/mysql.log"
> 
> pid="/var/run/mysqld/mysqld.pid" datadir="/var/lib/mysql"
> 
> socket="/var/run/mysqld/mysqld.sock" \
> 
>       op monitor interval="30s" timeout="30s" \
> 
>       op start interval="0" timeout="120" \
> 
>       op stop interval="0" timeout="120" \
> 
>       meta target-role="Started"
> 
> group MySQLDB fs_mysql mysql \
> 
>       meta target-role="Started"
> 
> group WebServices ip1 ip1arp fs_webfs apache2 \
> 
>       meta target-role="Started"
> 
> ms ms_drbd_mysql drbd_mysql \
> 
>       meta master-max="1" master-node-max="1" clone-max="2"
> 
> clone-node-max="1" notify="true"
> 
> ms ms_drbd_webfs drbd_webfs \
> 
>       meta master-max="1" master-node-max="1" clone-max="2"
> 
> clone-node-max="1" notify="true" target-role="Started"
> 
> colocation apache2_with_ip inf: apache2 ip1 colocation apache2_with_mysql
> inf: apache2 ms_drbd_mysql:Master colocation apache2_with_webfs inf:
> apache2 ms_drbd_webfs:Master colocation fs_on_drbd inf: fs_mysql
> ms_drbd_mysql:Master colocation ip_with_ip_arp inf: ip1 ip1arp colocation
> mysql_on_drbd inf: MySQLDB ms_drbd_mysql:Master colocation web_with_mysql
> inf: MySQLDB WebServices colocation webfs_on_drbd inf: fs_webfs
> ms_drbd_webfs:Master colocation webfs_with_fs inf: fs_webfs fs_mysql order
> apache2-after-arp inf: ip1arp:start apache2:start order arp-after-ip inf:
> ip1:start ip1arp:start order fs-mysql-after-drbd inf:
> ms_drbd_mysql:promote fs_mysql:start order fs-webfs-after-drbd inf:
> ms_drbd_webfs:promote fs_webfs:start order ip-after-mysql inf: mysql:start
> ip1:start order mysql-after-fs-mysql inf: fs_mysql:start mysql:start
> property $id="cib-bootstrap-options" \
> 
>       dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> 
>       cluster-infrastructure="Heartbeat" \
> 
>       expected-quorum-votes="1" \
> 
>       stonith-enabled="false" \
> 
>       no-quorum-policy="ignore"
> 
> rsc_defaults $id="rsc-options" \
> 
>       resource-stickiness="100"
> 
> 
> 
> 
> 
> Any help will be greatly appreciated.
> 
> 
> 
> Also we are considering to subscribe professional support for the cluster
> servers we put together. I know LINBIT, IBM, Redhat and Novel all provide
> certain support to Linux-HA clusters, anyone knows which one has the best
> technical coverage?
> 
> 
> 
> Thanks,
> 
> Liang Ma

Hi,

first of all, you configuration is, well, unconventional. I'd put all the 
primitives together in one group and any make the group colocated and ordered 
in respect to the DRBD's. Perhaps it'd be wise to make two groups.

Googeling through the archives of the list I'd bet this error is caused be the 
crm trying to mount a secondary DRBD. This might happen by some constraints 
that somehow end up forming a loop.

Could you please start with a very simple setup like:

primitive resDRBD ocf:linbit:drbd params drbd_resource="r0"
primitive resFS ocf:heartbeat:Filesystem \
  params device="/dev/drbd0" directory="/mnt" fstype="ext4"
ms msDRBD resDRBD meta notify="true"
collocation col_FS_DRBD inf: resFS:Started msDRBD:Master
order ord_DRBD_FS inf: msDRBD:promote resFS:start

If this works try to add a IP-Address as resource and make a group of both 
primitives:

primitive resIP ocf:heartbeat:IPaddr2 \
  params ip="10.10.10.193" nic="eth0" cidr_netmask="24"
group groupMySQL resFS resIP

Failover still working? What are the constraints now?
Now add the MySQL database to the group:

primitive mysql ocf:heartbeat:mysql \
  params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf" \
    user="mysql" group="mysql" log="/var/log/mysql.log" \
    pid="/var/run/mysqld/mysqld.pid" datadir="/var/lib/mysql" \
  socket="/var/run/mysqld/mysqld.sock"

edit group groupMySQL
to Add the Mysql server

and so on,


Hope you are successful taking one step after the other.

Greetings,
-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to