Hi, On Mon, Aug 17, 2009 at 04:44:01PM +0400, Ivan Gromov wrote: > Dear all, > I had some problem with my resource res1. But I can't understand where > was my problem. > The information obtained from the crm_mon is > > Node: node2 (237ceb38-a061-d99d-f4bf-944dd057ab5d): online > Node: node1 (965e45c6-19c4-241e-ff9d-4904882ef868): standby > > > resource_res1 (ocf::heartbeat:res1): Started node2 FAILED > RESOURCE2 (ocf::heartbeat:Resource): Started node2 > > Failed actions: > resource_res1_monitor_30000 (node=node2, call=35, rc=-2): Timed Out > > The definition of res1 is: > > <primitive id="resource_res1" class="ocf" type="res1" > provider="heartbeat"> > <operations> > <op id="34" name="monitor" interval="30s" timeout="90s" start_delay="0s" > on_fail="restart"/> > <op id="35" name="start" timeout="30s"/> > <op id="36" name="stop" timeout="30s"/> > </operations> > <instance_attributes id="resource_res1_instance_attrs"> > <attributes> > <nvpair name="target_role" id="resource_res1_target_role" > value="started"/> > </attributes> > </instance_attributes> > <meta_attributes id="resource_res1_meta"> > <attributes> > <nvpair name="resource_stickiness" id="resource_res1_Rs" value="150"/> > <nvpair name="resource_failure_stickiness" id="resource_res1_FRs" > value="-100"/> > </attributes> > </meta_attributes> > </primitive> > > Is it a mistake in monitor method of res1?
Yes, the monitor operation timed out. > I wanted to repeat this situation and I included sleep (100) in monitor > method. > I received this from crm_mon: > > Node: node2(237ceb38-a061-d99d-f4bf-944dd057ab5d): online > Node: node1 (965e45c6-19c4-241e-ff9d-4904882ef868): OFFLINE > > RESOURCE2 (ocf::heartbeat:Resource): Started node2 > > Failed actions: > resource_res1_monitor_0 (node=node2, call=24, rc=-2): Timed Out > > monitor_0 and monitor_30000 are not the same monitor method, right? What > monitor_30000 is? It's a repeating monitor operation (30000 is 30s represented in ms). monitor_0 is what is also called a resource probe, i.e. used when the cluster wants to establish the resource status initially. > What should I do to find where my mistake is? Take a look at your resource agent :) Thanks, Dejan > -- > > Best regards, > Ivan Gromov. > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
