Hi, On Mon, Oct 11, 2010 at 10:17:08PM -0600, Eric Schoeller wrote: > Good evening, > > I noticed that when corosync is set to start at boot my stonith devices > don't start up correctly. > > Here is some version info: > > cluster-glue: 1.0.6 > Corosync Cluster Engine, version '1.2.7' SVN revision '3008' > Name : pacemaker > Version : 1.0.9.1 > Release : 1.15.el5 > > I've read in many places that stonith devices may rely upon atd. I > haven't looked around enough to fully understand the necessity of this > dependency, but I believe it's the cause of the problem I'm > experiencing.
Wrong. atd is needed only for external/ssh and then only for the fencing operations. You're running into a different problem. > The corosync init script is configured to start and stop > at 20, and atd is configured to start and stop at 95 and 5 on my RHEL5.5 > system. If I move corosync up to 98 (after atd) my stonith devices start > just fine. If I add a start-delay to the stonith device that delays it > past the startup of atd, the stonith device also starts just fine. Using > the default init script and no start-delay ends with a Failed Action for > the stonith device, and it never recovers without manual intervention. > > My questions are: Why is the default init script shipped with the RPM > from the clusterlabs repo configured to start before atd if atd is a > dependency of certain parts of the pacemaker framework (if this indeed > the case)? Is it safe/recommended to add a start-delay of several > minutes to a stonith device to work around this problem? Well, if possible much better to fix the problem. Otherwise, start-delay on the start action may at times slow the fencing action. For instance, for startup fencing. > Thanks!! > > Eric Schoeller > > > Here are some logs: > > Oct 11 20:33:14 nodea crmd: [3156]: info: do_lrm_rsc_op: Performing > key=52:56:0:be604143-3a5a-4086-8e5c-d3d052804091 op=st-nodeb-ipmi_start_0 ) > Oct 11 20:33:14 nodea lrmd: [3153]: info: rsc:st-nodeb-ipmi:8: start > Oct 11 20:33:14 nodea lrmd: [3397]: info: Try to start STONITH resource > <rsc_id=st-nodeb-ipmi> : Device=external/ipmi > Oct 11 20:33:14 nodea crmd: [3156]: info: do_lrm_rsc_op: Performing > key=12:56:0:be604143-3a5a-4086-8e5c-d3d052804091 op=drbd_nfs:0_start_0 ) > Oct 11 20:33:14 nodea lrmd: [3153]: info: rsc:drbd_nfs:0:9: start > > Oct 11 20:33:37 nodea external/ipmi[3433]: ERROR: error executing > ipmitool: Error: Unable to establish IPMI v2 / RMCP+ session^M Unable to > get Chassis Power Status ipmitool fails here. Perhaps the network is not fully operational. Thanks, Dejan > Oct 11 20:33:38 nodea stonithd: [3432]: info: external_run_cmd: Calling > '/usr/lib64/stonith/plugins/external/ipmi status' returned 256 > Oct 11 20:33:38 nodea stonithd: [3432]: CRIT: external_status: 'ipmi > status' failed with rc 256 > Oct 11 20:33:38 nodea stonithd: [3151]: WARN: start st-nodeb-ipmi > failed, because its hostlist is empty > Oct 11 20:33:38 nodea lrmd: [3153]: WARN: Managed st-nodeb-ipmi:start > process 3397 exited with return code 1. > Oct 11 20:33:38 nodea crmd: [3156]: info: process_lrm_event: LRM > operation st-nodeb-ipmi_start_0 (call=8, rc=1, cib-update=16, > confirmed=true) unknown error > Oct 11 20:33:38 nodea attrd: [3154]: info: attrd_ais_dispatch: Update > relayed from nodeb.domain.com > Oct 11 20:33:38 nodea attrd: [3154]: info: attrd_trigger_update: Sending > flush op to all hosts for: fail-count-st-nodeb-ipmi (INFINITY) > Oct 11 20:33:38 nodea attrd: [3154]: info: attrd_perform_update: Sent > update 26: fail-count-st-nodeb-ipmi=INFINITY > Oct 11 20:33:38 nodea attrd: [3154]: info: attrd_ais_dispatch: Update > relayed from nodeb.domain.com > Oct 11 20:33:38 nodea attrd: [3154]: info: attrd_trigger_update: Sending > flush op to all hosts for: last-failure-st-nodeb-ipmi (1286850818) > Oct 11 20:33:38 nodea attrd: [3154]: info: attrd_perform_update: Sent > update 29: last-failure-st-nodeb-ipmi=1286850818 > Oct 11 20:33:38 nodea crmd: [3156]: info: do_lrm_rsc_op: Performing > key=1:58:0:be604143-3a5a-4086-8e5c-d3d052804091 op=st-nodeb-ipmi_stop_0 ) > Oct 11 20:33:38 nodea lrmd: [3153]: info: rsc:st-nodeb-ipmi:12: stop > Oct 11 20:33:38 nodea lrmd: [5063]: info: Try to stop STONITH resource > <rsc_id=st-nodeb-ipmi> : Device=external/ipmi > Oct 11 20:33:38 nodea stonithd: [3151]: notice: try to stop a resource > st-nodeb-ipmi who is not in started resource queue. > Oct 11 20:33:38 nodea lrmd: [3153]: info: Managed st-nodeb-ipmi:stop > process 5063 exited with return code 0. > Oct 11 20:33:38 nodea crmd: [3156]: info: process_lrm_event: LRM > operation st-nodeb-ipmi_stop_0 (call=12, rc=0, cib-update=17, > confirmed=true) ok > > Here is my cluster configuration: > > node nodea.domain.com \ > attributes standby="off" > node nodeb.domain.com \ > attributes standby="off" > primitive drbd_nfs ocf:linbit:drbd \ > params drbd_resource="r0" \ > op monitor interval="15s" > primitive fs_nfs ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/mnt/drbd0" fstype="ext3" \ > meta is-managed="true" > primitive ip_nfs ocf:heartbeat:IPaddr2 \ > params ip="1.2.3.20" cidr_netmask="32" nic="bond0" > primitive nfsserver ocf:heartbeat:nfsserver \ > params nfs_shared_infodir="/mnt/drbd0/nfs" nfs_ip="1.2.3.20" > nfs_init_script="/etc/init.d/nfs" > primitive st-nodea-ipmi stonith:external/ipmi \ > params hostname="nodea.domain.com" ipaddr="1.2.3.23" > userid="coolguy" passwd="changeme" interface="lanplus" \ > op monitor interval="20m" timeout="1m" \ > op start interval="0" timeout="1m" start-delay="360s" \ > meta target-role="Started" > primitive st-nodeb-ipmi stonith:external/ipmi \ > params hostname="nodeb.domain.com" ipaddr="1.2.3.25" > userid="coolguy" passwd="changeme" interface="lanplus" \ > op monitor interval="20m" timeout="1m" \ > op start interval="0" timeout="1m" start-delay="360s" \ > meta target-role="Started" > group nfs fs_nfs ip_nfs nfsserver \ > meta target-role="Started" > ms ms_drbd_nfs drbd_nfs \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" target-role="Started" is-managed="true" > location l-st-nodea st-nodea-ipmi -inf: nodea.domain.com > location l-st-nodeb st-nodeb-ipmi -inf: nodeb.domain.com > colocation nfs_on_drbd inf: nfs ms_drbd_nfs:Master > order nfs_after_drbd inf: ms_drbd_nfs:promote nfs:start > property $id="cib-bootstrap-options" \ > dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="true" \ > no-quorum-policy="ignore" \ > last-lrm-refresh="1286851694" > rsc_defaults $id="rsc-options" \ > resource-stickiness="100" > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
