On Wed, 2010-10-06 at 20:45 -0300, mike wrote: > On 10-10-06 07:09 PM, AR wrote: > > Hi, First let me say thank you to those of you that support the project. > > > > It appears that there are orphan processes running? How do I get rid of > > these? > > > > # crm_verify -LVV > > crm_verify[31892]: 2010/10/06_14:55:10 WARN: process_orphan_resource: > > Nothing known about resource vip_0:1 running on node2 > > crm_verify[31892]: 2010/10/06_14:55:10 ERROR: unpack_rsc_op: Hard error > > - vip_0:1_monitor_0 failed with rc=2: Preventing vip_0:1 from > > re-starting on node2 > > > > I have already rebuilt the cluster from scratch. > > > > # rm /var/lib/heartbeat/crm/* > > > > current configuration > > > > # crm configure show xml > > <?xml version="1.0" ?> > > <cib admin_epoch="0" crm_feature_set="3.0.1" dc-uuid="node2" epoch="66" > > have-quorum="1" num_updates="17" validate-with="pacemaker-1.0"> > > <configuration> > > <crm_config> > > <cluster_property_set id="cib-bootstrap-options"> > > <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" > > value="1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a"/> > > <nvpair id="cib-bootstrap-options-expected-quorum-votes" > > name="expected-quorum-votes" value="2"/> > > <nvpair id="nvpair-38d6c5a8-3510-4fc8-97fd-944e32f8fbfe" > > name="stonith-enabled" value="false"/> > > <nvpair id="nvpair-9429cf6e-009d-465c-bb9a-5d7a90056680" > > name="no-quorum-policy" value="ignore"/> > > <nvpair id="cib-bootstrap-options-last-lrm-refresh" > > name="last-lrm-refresh" value="1286398848"/> > > <nvpair id="nvpair-1214a8eb-bf4a-41ae-9c4e-33d9aac8d07c" > > name="default-resource-stickiness" value="1"/> > > </cluster_property_set> > > </crm_config> > > <rsc_defaults/> > > <op_defaults/> > > <nodes> > > <node id="node2" type="normal" uname="node2"> > > <instance_attributes id="nodes-node2"> > > <nvpair id="standby-node2" name="standby" value="false"/> > > </instance_attributes> > > </node> > > <node id="node2" type="normal" uname="node2"> > > <instance_attributes id="nodes-node2"> > > <nvpair id="standby-node2" name="standby" value="false"/> > > </instance_attributes> > > </node> > > </nodes> > > <resources> > > <group id="vip_n_sockd"> > > <meta_attributes id="vip_n_sockd-meta_attributes"> > > <nvpair id="nvpair-e3e90b0b-161b-49c2-8723-98647feb7b6c" > > name="target-role" value="Started"/> > > <nvpair id="vip_n_sockd-meta_attributes-is-managed" > > name="is-managed" value="true"/> > > </meta_attributes> > > <primitive class="ocf" id="vip" provider="heartbeat" > > type="IPaddr2"> > > <meta_attributes id="vip-meta_attributes"> > > <nvpair id="nvpair-cc694b48-ebd2-468f-a1bf-b3289d2cf28e" > > name="target-role" value="Started"/> > > <nvpair id="vip-meta_attributes-is-managed" > > name="is-managed" value="true"/> > > </meta_attributes> > > <operations id="vip-operations"> > > <op id="vip-op-monitor-10s" interval="20s" name="monitor" > > start-delay="0s" timeout="10s"/> > > </operations> > > <instance_attributes id="vip-instance_attributes"> > > <nvpair id="nvpair-5d33bc8c-3c04-405d-b71b-3cb2174da8ba" > > name="ip" value="10.8.64.140"/> > > </instance_attributes> > > </primitive> > > <primitive class="lsb" id="sockd" type="sockd"> > > <meta_attributes id="sockd-meta_attributes"> > > <nvpair id="nvpair-d6564710-29eb-4562-a77d-7997ef649764" > > name="target-role" value="Started"/> > > </meta_attributes> > > </primitive> > > </group> > > </resources> > > <constraints/> > > </configuration> > > </cib> > > > > now the issue is that if I put node1 into standby the resources got to > > unmanaged. > > > > # crm_verify -LVV > > crm_verify[377]: 2010/10/06_15:07:47 notice: unpack_config: On loss of > > CCM Quorum: Ignore > > crm_verify[377]: 2010/10/06_15:07:47 WARN: unpack_rsc_op: Operation > > vip_monitor_0 found resource vip active on rf3sxsocks1 > > crm_verify[377]: 2010/10/06_15:07:47 WARN: unpack_rsc_op: Processing > > failed op vip_stop_0 on rf3sxsocks1: unknown error > > crm_verify[377]: 2010/10/06_15:07:47 WARN: process_orphan_resource: > > Nothing known about resource vip_0:1 running on rf3sxsocks2 > > crm_verify[377]: 2010/10/06_15:07:47 ERROR: unpack_rsc_op: Hard error - > > vip_0:1_monitor_0 failed with rc=2: Preventing vip_0:1 from re-starting > > on rf3sxsocks2 > > crm_verify[377]: 2010/10/06_15:07:47 notice: group_print: Resource > > Group: vip_n_sockd > > crm_verify[377]: 2010/10/06_15:07:47 notice: native_print: vip > > (ocf::heartbeat:IPaddr2): Started rf3sxsocks1 (unmanaged) FAILED > > crm_verify[377]: 2010/10/06_15:07:47 notice: native_print: sockd > > (lsb:sockd): Stopped > > crm_verify[377]: 2010/10/06_15:07:47 WARN: common_apply_stickiness: > > Forcing vip away from rf3sxsocks1 after 1000000 failures (max=1000000) > > crm_verify[377]: 2010/10/06_15:07:47 WARN: native_color: Resource sockd > > cannot run anywhere > > crm_verify[377]: 2010/10/06_15:07:47 notice: LogActions: Leave resource > > vip (Started unmanaged) > > crm_verify[377]: 2010/10/06_15:07:47 notice: LogActions: Leave resource > > sockd (Stopped) > > Warnings found during check: config may not be valid > > > > > > Thanks, Alex > > > > > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > > > Is the configuration identical on both nodes, i.e. is cib.xml exactly > the same? I just wiped all files in /var/lib/heartbeat/crm copied cib.xml to both nodes. And: # chown hacluster:haclient * # chmod 600 * # rcopenais start
Started on node2 all works fine. when I start openais on node1 the resources go down. I do a cleanup and the resources come up on node1. I then try to put node1 in standby and the resources go down, a cleanup will not start them. The resources will only start on node1. # crm_verify -LVV crm_verify[21359]: 2010/10/06_21:01:32 notice: unpack_config: On loss of CCM Quorum: Ignore crm_verify[21359]: 2010/10/06_21:01:32 WARN: unpack_rsc_op: Operation vip_monitor_0 found resource vip active on rf3sxsocks1 crm_verify[21359]: 2010/10/06_21:01:32 WARN: unpack_rsc_op: Processing failed op vip_stop_0 on rf3sxsocks1: unknown error crm_verify[21359]: 2010/10/06_21:01:32 notice: group_print: Resource Group: vip_n_sockd crm_verify[21359]: 2010/10/06_21:01:32 notice: native_print: vip (ocf::heartbeat:IPaddr2): Started rf3sxsocks1 (unmanaged) FAILED crm_verify[21359]: 2010/10/06_21:01:32 notice: native_print: sockd (lsb:sockd): Stopped crm_verify[21359]: 2010/10/06_21:01:32 WARN: common_apply_stickiness: Forcing vip away from rf3sxsocks1 after 1000000 failures (max=1000000) crm_verify[21359]: 2010/10/06_21:01:32 WARN: native_color: Resource sockd cannot run anywhere crm_verify[21359]: 2010/10/06_21:01:32 notice: LogActions: Leave resource vip (Started unmanaged) crm_verify[21359]: 2010/10/06_21:01:32 notice: LogActions: Leave resource sockd(Stopped) Warnings found during check: config may not be valid > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
