Dear, Dejan Thanks a lot. Could you explain what means: >> resource_res1 (ocf::heartbeat:res1): Started node2 FAILED? Is it mean that the resource failed while it was starting?
> > resource_res1_monitor_30000 (node=node2, call=35, rc=-2): Timed Out > It's a repeating monitor operation (30000 is 30s represented in > ms). monitor_0 is what is also called a resource probe, i.e. used > when the cluster wants to establish the resource status initially. Does heartbeat use the monitor method for monitor_0 and monitor_30000? How does heartbeat use it? What do parametrs mean : call = 35 and rc = -2 ? Best whishes, Ivan * Dejan Muhamedagic <[email protected]> [Mon, 17 Aug 2009 16:05:00 +0200]: > Hi, > > On Mon, Aug 17, 2009 at 04:44:01PM +0400, Ivan Gromov wrote: > > Dear all, > > I had some problem with my resource res1. But I can't understand where > > was my problem. > > The information obtained from the crm_mon is > > > > Node: node2 (237ceb38-a061-d99d-f4bf-944dd057ab5d): online > > Node: node1 (965e45c6-19c4-241e-ff9d-4904882ef868): standby > > > > > > resource_res1 (ocf::heartbeat:res1): Started node2 FAILED > > RESOURCE2 (ocf::heartbeat:Resource): Started node2 > > > > Failed actions: > > resource_res1_monitor_30000 (node=node2, call=35, rc=-2): Timed Out > > > > The definition of res1 is: > > > > <primitive id="resource_res1" class="ocf" type="res1" > > provider="heartbeat"> > > <operations> > > <op id="34" name="monitor" interval="30s" timeout="90s" > start_delay="0s" > > on_fail="restart"/> > > <op id="35" name="start" timeout="30s"/> > > <op id="36" name="stop" timeout="30s"/> > > </operations> > > <instance_attributes id="resource_res1_instance_attrs"> > > <attributes> > > <nvpair name="target_role" id="resource_res1_target_role" > > value="started"/> > > </attributes> > > </instance_attributes> > > <meta_attributes id="resource_res1_meta"> > > <attributes> > > <nvpair name="resource_stickiness" id="resource_res1_Rs" value="150"/> > > <nvpair name="resource_failure_stickiness" id="resource_res1_FRs" > > value="-100"/> > > </attributes> > > </meta_attributes> > > </primitive> > > > > Is it a mistake in monitor method of res1? > > Yes, the monitor operation timed out. > > > I wanted to repeat this situation and I included sleep (100) in > monitor > > method. > > I received this from crm_mon: > > > > Node: node2(237ceb38-a061-d99d-f4bf-944dd057ab5d): online > > Node: node1 (965e45c6-19c4-241e-ff9d-4904882ef868): OFFLINE > > > > RESOURCE2 (ocf::heartbeat:Resource): Started node2 > > > > Failed actions: > > resource_res1_monitor_0 (node=node2, call=24, rc=-2): Timed Out > > > > monitor_0 and monitor_30000 are not the same monitor method, right? > What > > monitor_30000 is? > > It's a repeating monitor operation (30000 is 30s represented in > ms). monitor_0 is what is also called a resource probe, i.e. used > when the cluster wants to establish the resource status initially. > > > What should I do to find where my mistake is? > > Take a look at your resource agent :) > > Thanks, > > Dejan > > > -- > > > > Best regards, > > Ivan Gromov. > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
