Hello, Dejan Thanks for your last letter. Much appreciated! > The cluster tried to start it or intended to start it at this > node and failed. Do you mean that start method failed?
>> How does heartbeat use it? >Don't understand the question. The '_n' suffix means that the >certain operation will be repeated at the 'n' interval. Yes, it wasn't correct question. I'll try to reformulate it. I have timeout=90 and interval=30 for the res1 monitor. Assume that the heartbeat start the monitor. Will the heartbeat start the monitor method for res1 if previous monitor hadn't completed (timeout > interval)? Best wishes, Ivan * Dejan Muhamedagic <[email protected]> [Mon, 17 Aug 2009 23:37:56 +0200]: > Hi, > > On Mon, Aug 17, 2009 at 07:26:07PM +0400, Ivan Gromov wrote: > > Dear, Dejan > > Thanks a lot. > > Could you explain what means: >> resource_res1 (ocf::heartbeat:res1): > > Started node2 FAILED? Is it mean that the resource failed while it > was > > starting? > > The cluster tried to start it or intended to start it at this > node and failed. > > > > > resource_res1_monitor_30000 (node=node2, call=35, rc=-2): Timed > Out > > > It's a repeating monitor operation (30000 is 30s represented in > > > ms). monitor_0 is what is also called a resource probe, i.e. used > > > when the cluster wants to establish the resource status initially. > > Does heartbeat use the monitor method for monitor_0 and monitor_30000? > > Yes. > > > How does heartbeat use it? > > Don't understand the question. The '_n' suffix means that the > certain operation will be repeated at the 'n' interval. > > > What do parametrs mean : call = 35 and rc = -2 ? > > Don't have to worry about the call id, that's internal. The rc is > the exit code, in this case it means timeout. Normally, there is > an explanation for the exit code. > > Thanks, > > Dejan > > > > Best whishes, > > Ivan > > > > * Dejan Muhamedagic <[email protected]> [Mon, 17 Aug 2009 16:05:00 > > +0200]: > > > Hi, > > > > > > On Mon, Aug 17, 2009 at 04:44:01PM +0400, Ivan Gromov wrote: > > > > Dear all, > > > > I had some problem with my resource res1. But I can't understand > > where > > > > was my problem. > > > > The information obtained from the crm_mon is > > > > > > > > Node: node2 (237ceb38-a061-d99d-f4bf-944dd057ab5d): online > > > > Node: node1 (965e45c6-19c4-241e-ff9d-4904882ef868): standby > > > > > > > > > > > > resource_res1 (ocf::heartbeat:res1): Started node2 FAILED > > > > RESOURCE2 (ocf::heartbeat:Resource): Started node2 > > > > > > > > Failed actions: > > > > resource_res1_monitor_30000 (node=node2, call=35, rc=-2): Timed > Out > > > > > > > > The definition of res1 is: > > > > > > > > <primitive id="resource_res1" class="ocf" type="res1" > > > > provider="heartbeat"> > > > > <operations> > > > > <op id="34" name="monitor" interval="30s" timeout="90s" > > > start_delay="0s" > > > > on_fail="restart"/> > > > > <op id="35" name="start" timeout="30s"/> > > > > <op id="36" name="stop" timeout="30s"/> > > > > </operations> > > > > <instance_attributes id="resource_res1_instance_attrs"> > > > > <attributes> > > > > <nvpair name="target_role" id="resource_res1_target_role" > > > > value="started"/> > > > > </attributes> > > > > </instance_attributes> > > > > <meta_attributes id="resource_res1_meta"> > > > > <attributes> > > > > <nvpair name="resource_stickiness" id="resource_res1_Rs" > > value="150"/> > > > > <nvpair name="resource_failure_stickiness" id="resource_res1_FRs" > > > > value="-100"/> > > > > </attributes> > > > > </meta_attributes> > > > > </primitive> > > > > > > > > Is it a mistake in monitor method of res1? > > > > > > Yes, the monitor operation timed out. > > > > > > > I wanted to repeat this situation and I included sleep (100) in > > > monitor > > > > method. > > > > I received this from crm_mon: > > > > > > > > Node: node2(237ceb38-a061-d99d-f4bf-944dd057ab5d): online > > > > Node: node1 (965e45c6-19c4-241e-ff9d-4904882ef868): OFFLINE > > > > > > > > RESOURCE2 (ocf::heartbeat:Resource): Started node2 > > > > > > > > Failed actions: > > > > resource_res1_monitor_0 (node=node2, call=24, rc=-2): Timed Out > > > > > > > > monitor_0 and monitor_30000 are not the same monitor method, > right? > > > What > > > > monitor_30000 is? > > > > > > It's a repeating monitor operation (30000 is 30s represented in > > > ms). monitor_0 is what is also called a resource probe, i.e. used > > > when the cluster wants to establish the resource status initially. > > > > > > > What should I do to find where my mistake is? > > > > > > Take a look at your resource agent :) > > > > > > Thanks, > > > > > > Dejan > > > > > > > -- > > > > > > > > Best regards, > > > > Ivan Gromov. > > > > _______________________________________________ > > > > Linux-HA mailing list > > > > [email protected] > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > > > Linux-HA mailing list > > > [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
