Re: [Linux-HA] timed out of monitor

Ivan Gromov Mon, 17 Aug 2009 08:26:29 -0700

Dear, Dejan
Thanks a lot.
Could you explain what means: >> resource_res1 (ocf::heartbeat:res1): 
Started node2 FAILED?  Is it mean that the resource failed while it was 
starting?


> > resource_res1_monitor_30000 (node=node2, call=35, rc=-2): Timed Out
> It's a repeating monitor operation (30000 is 30s represented in
> ms). monitor_0 is what is also called a resource probe, i.e. used
> when the cluster wants to establish the resource status initially.
Does heartbeat use the monitor method for monitor_0 and monitor_30000? 
How does heartbeat use it?
What do parametrs mean : call = 35 and rc = -2 ?

Best whishes,
Ivan

* Dejan Muhamedagic <[email protected]> [Mon, 17 Aug 2009 16:05:00 
+0200]:
> Hi,
>
> On Mon, Aug 17, 2009 at 04:44:01PM +0400, Ivan Gromov wrote:
> > Dear all,
> > I had some problem with my resource res1. But I can't understand 
where
> > was my problem.
> > The information obtained from the crm_mon is
> >  
> > Node: node2 (237ceb38-a061-d99d-f4bf-944dd057ab5d): online
> > Node: node1 (965e45c6-19c4-241e-ff9d-4904882ef868): standby
> >  
> >  
> > resource_res1      (ocf::heartbeat:res1):     Started node2 FAILED
> > RESOURCE2   (ocf::heartbeat:Resource):  Started node2
> >  
> > Failed actions:
> > resource_res1_monitor_30000 (node=node2, call=35, rc=-2): Timed Out
> >  
> > The definition of res1 is:
> >  
> > <primitive id="resource_res1" class="ocf" type="res1"
> > provider="heartbeat">
> > <operations>
> > <op id="34" name="monitor" interval="30s" timeout="90s"
> start_delay="0s"
> > on_fail="restart"/>
> > <op id="35" name="start" timeout="30s"/>
> > <op id="36" name="stop" timeout="30s"/>
> > </operations>
> > <instance_attributes id="resource_res1_instance_attrs">
> > <attributes>
> > <nvpair name="target_role" id="resource_res1_target_role"
> > value="started"/>
> > </attributes>
> > </instance_attributes>
> > <meta_attributes id="resource_res1_meta">
> > <attributes>
> > <nvpair name="resource_stickiness" id="resource_res1_Rs" 
value="150"/>
> > <nvpair name="resource_failure_stickiness" id="resource_res1_FRs"
> > value="-100"/>
> > </attributes>
> > </meta_attributes>
> > </primitive>
> >  
> > Is it a mistake in monitor method of res1?
>
> Yes, the monitor operation timed out.
>
> > I wanted to repeat this situation and I included sleep (100) in
> monitor
> > method.
> > I received this from crm_mon:
> >  
> > Node: node2(237ceb38-a061-d99d-f4bf-944dd057ab5d): online
> > Node: node1 (965e45c6-19c4-241e-ff9d-4904882ef868): OFFLINE
> >  
> > RESOURCE2   (ocf::heartbeat:Resource):  Started node2
> >  
> > Failed actions:
> > resource_res1_monitor_0 (node=node2, call=24, rc=-2): Timed Out
> >  
> > monitor_0 and monitor_30000 are not the same monitor method, right?
> What
> > monitor_30000 is?
>
> It's a repeating monitor operation (30000 is 30s represented in
> ms). monitor_0 is what is also called a resource probe, i.e. used
> when the cluster wants to establish the resource status initially.
>
> > What should I do to find where my mistake is?
>
> Take a look at your resource agent :)
>
> Thanks,
>
> Dejan
>
> > --
> >  
> > Best regards,
> > Ivan Gromov.
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] timed out of monitor

Reply via email to