Re: [Linux-HA] timed out of monitor

Dejan Muhamedagic Mon, 17 Aug 2009 14:38:02 -0700

Hi,

On Mon, Aug 17, 2009 at 07:26:07PM +0400, Ivan Gromov wrote:
> Dear, Dejan
> Thanks a lot.
> Could you explain what means: >> resource_res1 (ocf::heartbeat:res1): 
> Started node2 FAILED?  Is it mean that the resource failed while it was 
> starting?


The cluster tried to start it or intended to start it at this
node and failed.

> > > resource_res1_monitor_30000 (node=node2, call=35, rc=-2): Timed Out
> > It's a repeating monitor operation (30000 is 30s represented in
> > ms). monitor_0 is what is also called a resource probe, i.e. used
> > when the cluster wants to establish the resource status initially.
> Does heartbeat use the monitor method for monitor_0 and monitor_30000? 

Yes.

> How does heartbeat use it?

Don't understand the question. The '_n' suffix means that the
certain operation will be repeated at the 'n' interval.

> What do parametrs mean : call = 35 and rc = -2 ?

Don't have to worry about the call id, that's internal. The rc is
the exit code, in this case it means timeout. Normally, there is
an explanation for the exit code.

Thanks,

Dejan


> Best whishes,
> Ivan
> 
> * Dejan Muhamedagic <[email protected]> [Mon, 17 Aug 2009 16:05:00 
> +0200]:
> > Hi,
> >
> > On Mon, Aug 17, 2009 at 04:44:01PM +0400, Ivan Gromov wrote:
> > > Dear all,
> > > I had some problem with my resource res1. But I can't understand 
> where
> > > was my problem.
> > > The information obtained from the crm_mon is
> > >  
> > > Node: node2 (237ceb38-a061-d99d-f4bf-944dd057ab5d): online
> > > Node: node1 (965e45c6-19c4-241e-ff9d-4904882ef868): standby
> > >  
> > >  
> > > resource_res1      (ocf::heartbeat:res1):     Started node2 FAILED
> > > RESOURCE2   (ocf::heartbeat:Resource):  Started node2
> > >  
> > > Failed actions:
> > > resource_res1_monitor_30000 (node=node2, call=35, rc=-2): Timed Out
> > >  
> > > The definition of res1 is:
> > >  
> > > <primitive id="resource_res1" class="ocf" type="res1"
> > > provider="heartbeat">
> > > <operations>
> > > <op id="34" name="monitor" interval="30s" timeout="90s"
> > start_delay="0s"
> > > on_fail="restart"/>
> > > <op id="35" name="start" timeout="30s"/>
> > > <op id="36" name="stop" timeout="30s"/>
> > > </operations>
> > > <instance_attributes id="resource_res1_instance_attrs">
> > > <attributes>
> > > <nvpair name="target_role" id="resource_res1_target_role"
> > > value="started"/>
> > > </attributes>
> > > </instance_attributes>
> > > <meta_attributes id="resource_res1_meta">
> > > <attributes>
> > > <nvpair name="resource_stickiness" id="resource_res1_Rs" 
> value="150"/>
> > > <nvpair name="resource_failure_stickiness" id="resource_res1_FRs"
> > > value="-100"/>
> > > </attributes>
> > > </meta_attributes>
> > > </primitive>
> > >  
> > > Is it a mistake in monitor method of res1?
> >
> > Yes, the monitor operation timed out.
> >
> > > I wanted to repeat this situation and I included sleep (100) in
> > monitor
> > > method.
> > > I received this from crm_mon:
> > >  
> > > Node: node2(237ceb38-a061-d99d-f4bf-944dd057ab5d): online
> > > Node: node1 (965e45c6-19c4-241e-ff9d-4904882ef868): OFFLINE
> > >  
> > > RESOURCE2   (ocf::heartbeat:Resource):  Started node2
> > >  
> > > Failed actions:
> > > resource_res1_monitor_0 (node=node2, call=24, rc=-2): Timed Out
> > >  
> > > monitor_0 and monitor_30000 are not the same monitor method, right?
> > What
> > > monitor_30000 is?
> >
> > It's a repeating monitor operation (30000 is 30s represented in
> > ms). monitor_0 is what is also called a resource probe, i.e. used
> > when the cluster wants to establish the resource status initially.
> >
> > > What should I do to find where my mistake is?
> >
> > Take a look at your resource agent :)
> >
> > Thanks,
> >
> > Dejan
> >
> > > --
> > >  
> > > Best regards,
> > > Ivan Gromov.
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] timed out of monitor

Reply via email to