Good evening,

I noticed that when corosync is set to start at boot my stonith devices 
don't start up correctly.

Here is some version info:

cluster-glue: 1.0.6
Corosync Cluster Engine, version '1.2.7' SVN revision '3008'
Name        : pacemaker
Version     : 1.0.9.1
Release     : 1.15.el5

I've read in many places that stonith devices may rely upon atd. I 
haven't looked around enough to fully understand the necessity of this 
dependency, but I believe it's the cause of the problem I'm 
experiencing. The corosync init script is configured to start and stop 
at 20, and atd is configured to start and stop at 95 and 5 on my RHEL5.5 
system. If I move corosync up to 98 (after atd) my stonith devices start 
just fine. If I add a start-delay to the stonith device that delays it 
past the startup of atd, the stonith device also starts just fine. Using 
the default init script and no start-delay ends with a Failed Action for 
the stonith device, and it never recovers without manual intervention.

My questions are: Why is the default init script shipped with the RPM 
from the clusterlabs repo configured to start before atd if atd is a 
dependency of certain parts of the pacemaker framework (if this indeed 
the case)? Is it safe/recommended to add a start-delay of several 
minutes to a stonith device to work around this problem?


Thanks!!

Eric Schoeller


Here are some logs:

Oct 11 20:33:14 nodea crmd: [3156]: info: do_lrm_rsc_op: Performing 
key=52:56:0:be604143-3a5a-4086-8e5c-d3d052804091 op=st-nodeb-ipmi_start_0 )
Oct 11 20:33:14 nodea lrmd: [3153]: info: rsc:st-nodeb-ipmi:8: start
Oct 11 20:33:14 nodea lrmd: [3397]: info: Try to start STONITH resource 
<rsc_id=st-nodeb-ipmi> : Device=external/ipmi
Oct 11 20:33:14 nodea crmd: [3156]: info: do_lrm_rsc_op: Performing 
key=12:56:0:be604143-3a5a-4086-8e5c-d3d052804091 op=drbd_nfs:0_start_0 )
Oct 11 20:33:14 nodea lrmd: [3153]: info: rsc:drbd_nfs:0:9: start

Oct 11 20:33:37 nodea external/ipmi[3433]: ERROR: error executing 
ipmitool: Error: Unable to establish IPMI v2 / RMCP+ session^M Unable to 
get Chassis Power Status
Oct 11 20:33:38 nodea stonithd: [3432]: info: external_run_cmd: Calling 
'/usr/lib64/stonith/plugins/external/ipmi status' returned 256
Oct 11 20:33:38 nodea stonithd: [3432]: CRIT: external_status: 'ipmi 
status' failed with rc 256
Oct 11 20:33:38 nodea stonithd: [3151]: WARN: start st-nodeb-ipmi 
failed, because its hostlist is empty
Oct 11 20:33:38 nodea lrmd: [3153]: WARN: Managed st-nodeb-ipmi:start 
process 3397 exited with return code 1.
Oct 11 20:33:38 nodea crmd: [3156]: info: process_lrm_event: LRM 
operation st-nodeb-ipmi_start_0 (call=8, rc=1, cib-update=16, 
confirmed=true) unknown error
Oct 11 20:33:38 nodea attrd: [3154]: info: attrd_ais_dispatch: Update 
relayed from nodeb.domain.com
Oct 11 20:33:38 nodea attrd: [3154]: info: attrd_trigger_update: Sending 
flush op to all hosts for: fail-count-st-nodeb-ipmi (INFINITY)
Oct 11 20:33:38 nodea attrd: [3154]: info: attrd_perform_update: Sent 
update 26: fail-count-st-nodeb-ipmi=INFINITY
Oct 11 20:33:38 nodea attrd: [3154]: info: attrd_ais_dispatch: Update 
relayed from nodeb.domain.com
Oct 11 20:33:38 nodea attrd: [3154]: info: attrd_trigger_update: Sending 
flush op to all hosts for: last-failure-st-nodeb-ipmi (1286850818)
Oct 11 20:33:38 nodea attrd: [3154]: info: attrd_perform_update: Sent 
update 29: last-failure-st-nodeb-ipmi=1286850818
Oct 11 20:33:38 nodea crmd: [3156]: info: do_lrm_rsc_op: Performing 
key=1:58:0:be604143-3a5a-4086-8e5c-d3d052804091 op=st-nodeb-ipmi_stop_0 )
Oct 11 20:33:38 nodea lrmd: [3153]: info: rsc:st-nodeb-ipmi:12: stop
Oct 11 20:33:38 nodea lrmd: [5063]: info: Try to stop STONITH resource 
<rsc_id=st-nodeb-ipmi> : Device=external/ipmi
Oct 11 20:33:38 nodea stonithd: [3151]: notice: try to stop a resource 
st-nodeb-ipmi who is not in started resource queue.
Oct 11 20:33:38 nodea lrmd: [3153]: info: Managed st-nodeb-ipmi:stop 
process 5063 exited with return code 0.
Oct 11 20:33:38 nodea crmd: [3156]: info: process_lrm_event: LRM 
operation st-nodeb-ipmi_stop_0 (call=12, rc=0, cib-update=17, 
confirmed=true) ok

Here is my cluster configuration:

node nodea.domain.com \                                               
        attributes standby="off"                                         
node nodeb.domain.com \                                                  
        attributes standby="off"                                         
primitive drbd_nfs ocf:linbit:drbd \                                     
        params drbd_resource="r0" \                                      
        op monitor interval="15s"                                        
primitive fs_nfs ocf:heartbeat:Filesystem \                              
        params device="/dev/drbd0" directory="/mnt/drbd0" fstype="ext3" \
        meta is-managed="true"                                           
primitive ip_nfs ocf:heartbeat:IPaddr2 \                                 
        params ip="1.2.3.20" cidr_netmask="32" nic="bond0"               
primitive nfsserver ocf:heartbeat:nfsserver \                            
        params nfs_shared_infodir="/mnt/drbd0/nfs" nfs_ip="1.2.3.20" 
nfs_init_script="/etc/init.d/nfs"
primitive st-nodea-ipmi stonith:external/ipmi \
        params hostname="nodea.domain.com" ipaddr="1.2.3.23" 
userid="coolguy" passwd="changeme" interface="lanplus" \
        op monitor interval="20m" timeout="1m" \
        op start interval="0" timeout="1m" start-delay="360s" \
        meta target-role="Started"
primitive st-nodeb-ipmi stonith:external/ipmi \
        params hostname="nodeb.domain.com" ipaddr="1.2.3.25" 
userid="coolguy" passwd="changeme" interface="lanplus" \
        op monitor interval="20m" timeout="1m" \
        op start interval="0" timeout="1m" start-delay="360s" \
        meta target-role="Started"
group nfs fs_nfs ip_nfs nfsserver \
        meta target-role="Started"
ms ms_drbd_nfs drbd_nfs \
        meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true" target-role="Started" is-managed="true"
location l-st-nodea st-nodea-ipmi -inf: nodea.domain.com
location l-st-nodeb st-nodeb-ipmi -inf: nodeb.domain.com
colocation nfs_on_drbd inf: nfs ms_drbd_nfs:Master
order nfs_after_drbd inf: ms_drbd_nfs:promote nfs:start
property $id="cib-bootstrap-options" \
        dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="true" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1286851694"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to