Re: [Linux-HA] Resource in master state - no monitor operation

Assaf N Tue, 02 Oct 2007 11:40:50 -0700

> Assaf N wrote:
> > Hello,
> >
> > I started a small test cluster using heartbeat 2.1.1. The cluster contains
> one simple master/slave resource.
> >
> > While playing around with this cluster, I've noticed that whenever the
> resource is promoted to be the master on a machine, Heartbeat stops calling
> its monitor operation on this node. A quick look on the ha-debug log reveals
> that the monitor op is stopped intentionally, because of the resource
> promotion. However, there is no restarting of this op once the node becomes
> the master. When a second node starts and its resource takes the master
> role, our demoted resource starts to be monitored again.
> >
> > I'm attaching my cib.xml, ha-debug and the resource agent script. Do I
> have a configuration error, or have I encountered a bug?
> 
> please refer to the following conversation and tell us whether this
> resolves your issue:
> 
>     http://www.gossamer-threads.com/lists/linuxha/users/42529
>


Thanks, it does resolve my issue. How embarrassing to discover it was answered 
a few days ago... I searched the list a few days before it was posted, and 
neglected to search again before sending my question... :-)

Now I've encountered a new issue - the 'success' return code from the monitor 
function is supposed to be 0 when the resource is a slave, and 8 when it's a 
master, right? Well, this is true when the resource is first started, but after 
the resource is promoted and demoted heartbeat still considers 8 to be the 
success return value, although the resource is not a master anymore. If I 
return 0 the resource is stopped and started, and the success return value is 0 
again. Is this on purpose?

I'm experiencing another strange behavior on the following scenario - one node 
is the DC and running the master instance of a resource, and the second is 
running the slave instance. When I stop the heartbeat service on the first node 
(rh4vm2, the DC) it takes it a hundred seconds to go down, and it complains 
about the monitor action running on the second node (rh4vm1):

crmd[11630]: 2007/10/02_12:46:58 info: stop_subsystem: Sent -TERM to tengine: 
[11680]
crmd[11630]: 2007/10/02_12:46:58 info: do_shutdown: Waiting for subsystems to 
exit
tengine[11680]: 2007/10/02_12:47:06 WARN: action_timer_callback: Timer popped 
(abort_level=1000000, complete=false)
tengine[11680]: 2007/10/02_12:47:06 WARN: print_elem: Action missed its 
timeout[Action 2]: In-flight (id: rsc_smith:0_monitor_3000, l
oc: rh4vm1, priority: 20)
tengine[11680]: 2007/10/02_12:48:37 WARN: global_timer_callback: Timer popped 
(abort_level=1000000, complete=false)
tengine[11680]: 2007/10/02_12:48:37 info: unconfirmed_actions: Action 
rsc_smith:0_monitor_3000 2 unconfirmed from peer
tengine[11680]: 2007/10/02_12:48:37 ERROR: unconfirmed_actions: Waiting on 1 
unconfirmed actions
tengine[11680]: 2007/10/02_12:48:37 WARN: global_timer_callback: Transition 
abort timeout reached... marking transition complete.
tengine[11680]: 2007/10/02_12:48:37 info: notify_crmd: Exiting after transition
tengine[11680]: 2007/10/02_12:48:37 WARN: global_timer_callback: Writing 1 
unconfirmed actions to the CIB
tengine[11680]: 2007/10/02_12:48:37 info: unconfirmed_actions: Action 
rsc_smith:0_monitor_3000 2 unconfirmed from peer
tengine[11680]: 2007/10/02_12:48:37 ERROR: unconfirmed_actions: Waiting on 1 
unconfirmed actions

Any idea why this happens?


Thanks for your help,
Assaf




my cib:

 <cib admin_epoch="0" have_quorum="false" ignore_dtd="false" num_peers="0" 
cib_feature_revision="1.3" generated="false" epoch="1385"num_updates="1" 
cib-last-written="Tue Oct  2 12:50:31 2007">
   <configuration>
     <crm_config>
       <cluster_property_set id="cluster_properties">
         <attributes>
           <nvpair id="default-resource-stickiness" 
name="default-resource-stickiness" value="70"/>
           <nvpair id="default-resource-failure-stickiness" 
name="default-resource-failure-stickiness" value="-100"/>
         </attributes>
       </cluster_property_set>
       <cluster_property_set id="cib-bootstrap-options">
         <attributes>
           <nvpair name="last-lrm-refresh" 
id="cib-bootstrap-options-last-lrm-refresh" value="1191307342"/>
         </attributes>
       </cluster_property_set>
     </crm_config>
     <nodes>
       <node id="0441b161-2421-4218-8b03-0c044937e197" uname="rh4vm1" 
type="normal">
         <instance_attributes id="master-0441b161-2421-4218-8b03-0c044937e197">
           <attributes>
             <nvpair 
id="nodes-master-rsc_smith:1-0441b161-2421-4218-8b03-0c044937e197" 
name="master-rsc_smith:1" value="20"/>
             <nvpair 
id="nodes-master-rsc_smith:0-0441b161-2421-4218-8b03-0c044937e197" 
name="master-rsc_smith:0" value="20"/>
           </attributes>
         </instance_attributes>
       </node>
       <node uname="rh4vm2" type="normal" 
id="f55d8a1b-6931-4a84-989c-7f241ce2897e">
         <instance_attributes id="master-f55d8a1b-6931-4a84-989c-7f241ce2897e">
           <attributes>
             <nvpair name="master-rsc_smith:0" 
id="nodes-master-rsc_smith:0-f55d8a1b-6931-4a84-989c-7f241ce2897e" value="20"/>
             <nvpair name="master-rsc_smith:1" 
id="nodes-master-rsc_smith:1-f55d8a1b-6931-4a84-989c-7f241ce2897e" value="30"/>
           </attributes>
         </instance_attributes>
       </node>
     </nodes>
     <resources>
       <master_slave id="master_slave_mvap" ordered="false" interleave="false" 
notify="false">
         <instance_attributes id="ia_clone_ip">
           <attributes>
             <nvpair id="nvpair_ms_grp_mvap_clone_max" name="clone_max" 
value="2"/>
             <nvpair id="nvpair_ms_grp_mvap_clone_node_max" 
name="clone_node_max" value="1"/>
             <nvpair id="nvpair_ms_grp_mvap_master_max" name="master_max" 
value="1"/>
             <nvpair id="nvpair_ms_grp_mvap_master_node_max" 
name="master_node_max" value="1"/>
           </attributes>
         </instance_attributes>
         <primitive id="rsc_smith" class="ocf" type="smith2_agent" 
provider="ML">
           <operations>
             <op id="op_smith_monitor_special" name="monitor" timeout="3s" 
interval="3000ms" start_delay="6s">
               <instance_attributes id="ia_smith_monitor_special">
                 <attributes>
                   <nvpair id="nvpair_smith_monitor_special_action" 
name="monitor_action" value="BIT1"/>
                 </attributes>
               </instance_attributes>
             </op>
             <op id="op_smith_monitor_master" name="monitor" timeout="3s" 
interval="3001ms" start_delay="6s" role="Master">
               <instance_attributes id="ia_smith_monitor_master">
                 <attributes>
                   <nvpair id="nvpair_smith_monitor_master_action" 
name="monitor_action" value="BIT2"/>
                   <nvpair id="nvpair_smith_monitor_master_state" 
name="master_monitor" value="master"/>
                 </attributes>
               </instance_attributes>
             </op>
           </operations>
         </primitive>
       </master_slave>
     </resources>
     <constraints>
       <rsc_location id="loc_smith0" rsc="rsc_smith:0">
         <rule id="loc_smith0_rule_run" score="INFINITY">
           <expression id="loc_smith0_expression_run" attribute="#uname" 
operation="eq" value="rh4vm1"/>
         </rule>
         <rule id="loc_smith0_rule_norun" score="-INFINITY">
           <expression id="loc_smith0_expression_norun" attribute="#uname" 
operation="ne" value="rh4vm1"/>
         </rule>
       </rsc_location>
       <rsc_location id="loc_smith1" rsc="rsc_smith:1">
         <rule id="loc_smith1_rule_run" score="INFINITY">
           <expression id="loc_smith1_expression_run" attribute="#uname" 
operation="eq" value="rh4vm2"/>
         </rule>
         <rule id="loc_smith1_rule_norun" score="-INFINITY">
           <expression id="loc_smith1_expression_norun" attribute="#uname" 
operation="ne" value="rh4vm2"/>
         </rule>
       </rsc_location>
     </constraints>
   </configuration>
 </cib>













> cheers,
> raoul bhatia
> -- 
> ____________________________________________________________________
> DI (FH) Raoul Bhatia M.Sc.          email.          [EMAIL PROTECTED]
> Technischer Leiter
> 
> IPAX - Aloy Bhatia Hava OEG         web.          http://www.ipax.at
> Barawitzkagasse 10/2/2/11           email.            [EMAIL PROTECTED]
> 1190 Wien                           tel.               +43 1 3670030
> FN 277995t HG Wien                  fax.            +43 1 3670030 15
> ____________________________________________________________________
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 




       
____________________________________________________________________________________
Need a vacation? Get great deals
to amazing places on Yahoo! Travel.
http://travel.yahoo.com/

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Resource in master state - no monitor operation

Reply via email to