Re: [Linux-HA] timed out of monitor

Dejan Muhamedagic Mon, 17 Aug 2009 07:04:59 -0700

Hi,

On Mon, Aug 17, 2009 at 04:44:01PM +0400, Ivan Gromov wrote:
> Dear all,
> I had some problem with my resource res1. But I can't understand where 
> was my problem.
> The information obtained from the crm_mon is
>  
> Node: node2 (237ceb38-a061-d99d-f4bf-944dd057ab5d): online
> Node: node1 (965e45c6-19c4-241e-ff9d-4904882ef868): standby
>  
>  
> resource_res1      (ocf::heartbeat:res1):     Started node2 FAILED
> RESOURCE2   (ocf::heartbeat:Resource):  Started node2
>  
> Failed actions:
> resource_res1_monitor_30000 (node=node2, call=35, rc=-2): Timed Out
>  
> The definition of res1 is:
>  
> <primitive id="resource_res1" class="ocf" type="res1" 
> provider="heartbeat">
> <operations>
> <op id="34" name="monitor" interval="30s" timeout="90s" start_delay="0s" 
> on_fail="restart"/>
> <op id="35" name="start" timeout="30s"/>
> <op id="36" name="stop" timeout="30s"/>
> </operations>
> <instance_attributes id="resource_res1_instance_attrs">
> <attributes>
> <nvpair name="target_role" id="resource_res1_target_role" 
> value="started"/>
> </attributes>
> </instance_attributes>
> <meta_attributes id="resource_res1_meta">
> <attributes>
> <nvpair name="resource_stickiness" id="resource_res1_Rs" value="150"/>
> <nvpair name="resource_failure_stickiness" id="resource_res1_FRs" 
> value="-100"/>
> </attributes>
> </meta_attributes>
> </primitive>
>  
> Is it a mistake in monitor method of res1?


Yes, the monitor operation timed out.

> I wanted to repeat this situation and I included sleep (100) in monitor 
> method.
> I received this from crm_mon:
>  
> Node: node2(237ceb38-a061-d99d-f4bf-944dd057ab5d): online
> Node: node1 (965e45c6-19c4-241e-ff9d-4904882ef868): OFFLINE
>  
> RESOURCE2   (ocf::heartbeat:Resource):  Started node2
>  
> Failed actions:
> resource_res1_monitor_0 (node=node2, call=24, rc=-2): Timed Out
>  
> monitor_0 and monitor_30000 are not the same monitor method, right? What 
> monitor_30000 is?

It's a repeating monitor operation (30000 is 30s represented in
ms). monitor_0 is what is also called a resource probe, i.e. used
when the cluster wants to establish the resource status initially.

> What should I do to find where my mistake is?

Take a look at your resource agent :)

Thanks,

Dejan

> --
>  
> Best regards,
> Ivan Gromov.
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] timed out of monitor

Reply via email to