Re: [Linux-HA] Orphan resource process(es) running

mike Thu, 07 Oct 2010 09:52:48 -0700

I'm curious - is that 10.8.64.140 address the VIP address for the cluster?


On 10-10-07 01:21 PM, AR wrote:
> On Wed, 2010-10-06 at 21:30 -0700, AR wrote:
>
> Solved.
>
> the issue was that the 10.8.64.140 address was sticking to node1.  I
> dont know why this was happening?  But once I removed the address all is
> working well.
>
>    
>> On Wed, 2010-10-06 at 20:45 -0300, mike wrote:
>>      
>>> On 10-10-06 07:09 PM, AR wrote:
>>>        
>>>> Hi, First let me say thank you to those of you that support the project.
>>>>
>>>> It appears that there are orphan processes running?  How do I get rid of
>>>> these?
>>>>
>>>> # crm_verify -LVV
>>>> crm_verify[31892]: 2010/10/06_14:55:10 WARN: process_orphan_resource:
>>>> Nothing known about resource vip_0:1 running on node2
>>>> crm_verify[31892]: 2010/10/06_14:55:10 ERROR: unpack_rsc_op: Hard error
>>>> - vip_0:1_monitor_0 failed with rc=2: Preventing vip_0:1 from
>>>> re-starting on node2
>>>>
>>>> I have already rebuilt the cluster from scratch.
>>>>
>>>> # rm /var/lib/heartbeat/crm/*
>>>>
>>>> current configuration
>>>>
>>>> # crm configure show xml
>>>> <?xml version="1.0" ?>
>>>> <cib admin_epoch="0" crm_feature_set="3.0.1" dc-uuid="node2" epoch="66"
>>>> have-quorum="1" num_updates="17" validate-with="pacemaker-1.0">
>>>>     <configuration>
>>>>       <crm_config>
>>>>         <cluster_property_set id="cib-bootstrap-options">
>>>>           <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
>>>> value="1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a"/>
>>>>           <nvpair id="cib-bootstrap-options-expected-quorum-votes"
>>>> name="expected-quorum-votes" value="2"/>
>>>>           <nvpair id="nvpair-38d6c5a8-3510-4fc8-97fd-944e32f8fbfe"
>>>> name="stonith-enabled" value="false"/>
>>>>           <nvpair id="nvpair-9429cf6e-009d-465c-bb9a-5d7a90056680"
>>>> name="no-quorum-policy" value="ignore"/>
>>>>           <nvpair id="cib-bootstrap-options-last-lrm-refresh"
>>>> name="last-lrm-refresh" value="1286398848"/>
>>>>           <nvpair id="nvpair-1214a8eb-bf4a-41ae-9c4e-33d9aac8d07c"
>>>> name="default-resource-stickiness" value="1"/>
>>>>         </cluster_property_set>
>>>>       </crm_config>
>>>>       <rsc_defaults/>
>>>>       <op_defaults/>
>>>>       <nodes>
>>>>         <node id="node2" type="normal" uname="node2">
>>>>           <instance_attributes id="nodes-node2">
>>>>             <nvpair id="standby-node2" name="standby" value="false"/>
>>>>           </instance_attributes>
>>>>         </node>
>>>>         <node id="node2" type="normal" uname="node2">
>>>>           <instance_attributes id="nodes-node2">
>>>>             <nvpair id="standby-node2" name="standby" value="false"/>
>>>>           </instance_attributes>
>>>>         </node>
>>>>       </nodes>
>>>>       <resources>
>>>>         <group id="vip_n_sockd">
>>>>           <meta_attributes id="vip_n_sockd-meta_attributes">
>>>>             <nvpair id="nvpair-e3e90b0b-161b-49c2-8723-98647feb7b6c"
>>>> name="target-role" value="Started"/>
>>>>             <nvpair id="vip_n_sockd-meta_attributes-is-managed"
>>>> name="is-managed" value="true"/>
>>>>           </meta_attributes>
>>>>           <primitive class="ocf" id="vip" provider="heartbeat"
>>>> type="IPaddr2">
>>>>             <meta_attributes id="vip-meta_attributes">
>>>>               <nvpair id="nvpair-cc694b48-ebd2-468f-a1bf-b3289d2cf28e"
>>>> name="target-role" value="Started"/>
>>>>               <nvpair id="vip-meta_attributes-is-managed"
>>>> name="is-managed" value="true"/>
>>>>             </meta_attributes>
>>>>             <operations id="vip-operations">
>>>>               <op id="vip-op-monitor-10s" interval="20s" name="monitor"
>>>> start-delay="0s" timeout="10s"/>
>>>>             </operations>
>>>>             <instance_attributes id="vip-instance_attributes">
>>>>               <nvpair id="nvpair-5d33bc8c-3c04-405d-b71b-3cb2174da8ba"
>>>> name="ip" value="10.8.64.140"/>
>>>>             </instance_attributes>
>>>>           </primitive>
>>>>           <primitive class="lsb" id="sockd" type="sockd">
>>>>             <meta_attributes id="sockd-meta_attributes">
>>>>               <nvpair id="nvpair-d6564710-29eb-4562-a77d-7997ef649764"
>>>> name="target-role" value="Started"/>
>>>>             </meta_attributes>
>>>>           </primitive>
>>>>         </group>
>>>>       </resources>
>>>>       <constraints/>
>>>>     </configuration>
>>>> </cib>
>>>>
>>>> now the issue is that if I put node1 into standby the resources got to
>>>> unmanaged.
>>>>
>>>> # crm_verify -LVV
>>>> crm_verify[377]: 2010/10/06_15:07:47 notice: unpack_config: On loss of
>>>> CCM Quorum: Ignore
>>>> crm_verify[377]: 2010/10/06_15:07:47 WARN: unpack_rsc_op: Operation
>>>> vip_monitor_0 found resource vip active on node1
>>>> crm_verify[377]: 2010/10/06_15:07:47 WARN: unpack_rsc_op: Processing
>>>> failed op vip_stop_0 on node1: unknown error
>>>> crm_verify[377]: 2010/10/06_15:07:47 WARN: process_orphan_resource:
>>>> Nothing known about resource vip_0:1 running on node2
>>>> crm_verify[377]: 2010/10/06_15:07:47 ERROR: unpack_rsc_op: Hard error -
>>>> vip_0:1_monitor_0 failed with rc=2: Preventing vip_0:1 from re-starting
>>>> on node2
>>>> crm_verify[377]: 2010/10/06_15:07:47 notice: group_print: Resource
>>>> Group: vip_n_sockd
>>>> crm_verify[377]: 2010/10/06_15:07:47 notice: native_print:     vip
>>>> (ocf::heartbeat:IPaddr2):  Started node1 (unmanaged) FAILED
>>>> crm_verify[377]: 2010/10/06_15:07:47 notice: native_print:     sockd
>>>> (lsb:sockd):       Stopped
>>>> crm_verify[377]: 2010/10/06_15:07:47 WARN: common_apply_stickiness:
>>>> Forcing vip away from node1 after 1000000 failures (max=1000000)
>>>> crm_verify[377]: 2010/10/06_15:07:47 WARN: native_color: Resource sockd
>>>> cannot run anywhere
>>>> crm_verify[377]: 2010/10/06_15:07:47 notice: LogActions: Leave resource
>>>> vip        (Started unmanaged)
>>>> crm_verify[377]: 2010/10/06_15:07:47 notice: LogActions: Leave resource
>>>> sockd      (Stopped)
>>>> Warnings found during check: config may not be valid
>>>>
>>>>
>>>> Thanks, Alex
>>>>
>>>>
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> [email protected]
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>>
>>>>
>>>>          
>>> Is the configuration identical on both nodes, i.e. is cib.xml exactly
>>> the same?
>>>        
>> I just wiped all files in /var/lib/heartbeat/crm copied cib.xml to both
>> nodes.  And:
>> # chown hacluster:haclient *
>> # chmod 600 *
>> # rcopenais start
>>
>> Started on node2 all works fine.
>> when I start openais on node1 the resources go down.  I do a cleanup and
>> the resources come up on node1.
>>
>> I then try to put node1 in standby and the resources go down, a cleanup
>> will not start them.  The resources will only start on node1.
>>
>> # crm_verify -LVV
>> crm_verify[21359]: 2010/10/06_21:01:32 notice: unpack_config: On loss of
>> CCM Quorum: Ignore
>> crm_verify[21359]: 2010/10/06_21:01:32 WARN: unpack_rsc_op: Operation
>> vip_monitor_0 found resource vip active on node1
>> crm_verify[21359]: 2010/10/06_21:01:32 WARN: unpack_rsc_op: Processing
>> failed op vip_stop_0 on node1: unknown error
>> crm_verify[21359]: 2010/10/06_21:01:32 notice: group_print: Resource
>> Group: vip_n_sockd
>> crm_verify[21359]: 2010/10/06_21:01:32 notice: native_print:     vip
>> (ocf::heartbeat:IPaddr2):    Started node1 (unmanaged) FAILED
>> crm_verify[21359]: 2010/10/06_21:01:32 notice: native_print:     sockd
>> (lsb:sockd): Stopped
>> crm_verify[21359]: 2010/10/06_21:01:32 WARN: common_apply_stickiness:
>> Forcing vip away from node1 after 1000000 failures (max=1000000)
>> crm_verify[21359]: 2010/10/06_21:01:32 WARN: native_color: Resource
>> sockd cannot run anywhere
>> crm_verify[21359]: 2010/10/06_21:01:32 notice: LogActions: Leave
>> resource vip (Started unmanaged)
>> crm_verify[21359]: 2010/10/06_21:01:32 notice: LogActions: Leave
>> resource sockd(Stopped)
>> Warnings found during check: config may not be valid
>>      
>>> _______________________________________________
>>> Linux-HA mailing list
>>> [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>>>        
>>
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>>      
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>    

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Orphan resource process(es) running

Reply via email to