I updated one of my clusters today, and among other things, I updated
from pacemaker-1.0.9 to 1.0.10. I don't know if that is directly related
or not.

The problem is that I cannot get the cluster to come up clean. Right now
all resources are running on one node and it is OK that way. As soon as
I start heartbeat on the second node, it goes into a stonith death
match. What I see is some failed actions involving trying to stop a DRBD
resource group. Here is a log snippet:

Dec 28 09:19:18 vmserve.scd.ucar.edu crmd: [7518]: info: do_lrm_rsc_op:
Performing key=21:2:0:fb701221-ba59-4de8-88dc-032cab9ec090
op=vmgroup1:0_stop_0 )
Dec 28 09:19:18 vmserve.scd.ucar.edu lrmd: [7514]: info:
rsc:vmgroup1:0:30: stop
Dec 28 09:19:18 vmserve.scd.ucar.edu crmd: [7518]: info: do_lrm_rsc_op:
Performing key=50:2:0:fb701221-ba59-4de8-88dc-032cab9ec090
op=vmgroup2:0_stop_0 )
Dec 28 09:19:18 vmserve.scd.ucar.edu lrmd: [7514]: info:
rsc:vmgroup2:0:31: stop
Dec 28 09:19:18 vmserve.scd.ucar.edu lrmd: [7514]: WARN: Managed
vmgroup1:0:stop process 8088 exited with return code 6.
Dec 28 09:19:18 vmserve.scd.ucar.edu crmd: [7518]: info:
process_lrm_event: LRM operation vmgroup1:0_stop_0 (call=30, rc=6,
cib-update=36, confirmed=true) not configured
Dec 28 09:19:18 vmserve.scd.ucar.edu lrmd: [7514]: WARN: Managed
vmgroup2:0:stop process 8089 exited with return code 6.


In this example, "vmgroup1" and "vmgroup2" are DRBD resources, then set
up as clones, which is the standard way to do this. Looks like this in
crm shell:

primitive vmgroup1 ocf:linbit:drbd \
        params drbd_resource="vmgroup1" \
        op monitor interval="59s" role="Master" timeout="30s" \
        op monitor interval="60s" role="Slave" timeout="20s" \
        op start interval="0" timeout="240s" \
        op stop interval="0" timeout="100s"
[...]
ms ms-vmgroup1 vmgroup1 \
        meta clone-max="2" notify="true" globally-unique="false"
target-role="Started"

This has always worked fine until today. 

Any ideas what I can do to further debug this?

I am running on CentOS 5.5 using the clusterlabs repos.

--Greg


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to