Dominik Klein wrote:
> With a failure stickiness of -30, you allow your groups resources to
> fail (400/30)=14 times. Is that what you want?

Although the default failure stickiness is -30, the group has a failure stickiness of -100. I would like to failover after 3 or 4 failures.
My test with 15 stop commands was "just to be sure."

> You don't have any monitor operations for the ipaddr and jboss
> resources. Failures on them are not detected. Configure monitor
> operations and try again.

I actually do have monitor operations on both, I accidentally sent out an old cib.xml, updated file attached. Previous showscores.sh output is correct for this cib.xml. It behaves as described in the last e-mail even with monitor operations on IPaddr2 and jboss.

> Also make sure you use a recent version.Otherwise you may also hit the
> bug of not increasing failcount in 2.1.3's crm. This is fixed in
> pacemaker (0.6.x)

Uh oh. I definitely have 2.1.3 with the crm_failcount bug, but I didn't think this would affect score calculation. I didn't install a pacemaker package, I used the CentOS4 extras RPMs. I hope CentOS4 / RHEL4 packages can be released. I could not rebuild RHEL5 packages from the openSUSE ha-clustering repository due to this:

configure:3065: gcc -c  -O2 -g  conftest.c >&5
conftest.c:2: error: syntax error before "me"
configure:3071: $? = 1
configure: failed program was:
| #ifndef __cplusplus
|    choke me
| #endif

Should I seek an alternative to these CentOS 4 extras RPMs?

> pps. where did you get the jboss RA? I'd be interested in it.

http://rgm.nu/jbossocf

I hacked it together, it's ugly. Hope it's useful to you, although it's tailored to a very old jboss release 3.0.8, with some customizations to support multiple instances of jboss on different ports. Relies on ps, awk, egrep, and curl, only tested on RHEL4. You'll want to change the HTTPCODE check with an URL for your servlet (or modify to use jmx-console).


Regards,

Roland







 <cib admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="0" 
cib_feature_revision="1.3" generated="false" num_updates="1" epoch="50" 
cib-last-written="Tue Mar 25 13:48:27 2008" ccm_transition="1">
   <configuration>
     <crm_config>
       <cluster_property_set id="cib-bootstrap-options">
         <attributes>
           <nvpair id="id-no-quorum-policy" name="no-quorum-policy" 
value="ignore"/>
           <nvpair id="cib-bootstrap-options-default-resource-stickiness" 
name="default-resource-stickiness" value="100"/>
           <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" 
value="2.1.3-node: 552305612591183b1628baa5bc6e903e0f1e26a3"/>
           <nvpair id="cib-bootstrap-options-last-lrm-refresh" 
name="last-lrm-refresh" value="1206118390"/>
           <nvpair 
id="cib-bootstrap-options-default-resource-failure-stickiness" 
name="default-resource-failure-stickiness" value="-30"/>
         </attributes>
       </cluster_property_set>
     </crm_config>
     <nodes>
       <node uname="slinkfail" type="normal" 
id="9b8c9849-b713-401b-86f9-c7a0402a4658">
         <instance_attributes id="nodes-9b8c9849-b713-401b-86f9-c7a0402a4658">
           <attributes>
             <nvpair name="standby" 
id="standby-9b8c9849-b713-401b-86f9-c7a0402a4658" value="off"/>
           </attributes>
         </instance_attributes>
       </node>
       <node id="cb25eedb-6f51-4c75-b137-ec375e253890" uname="slinkmaster" 
type="normal">
         <instance_attributes id="nodes-cb25eedb-6f51-4c75-b137-ec375e253890">
           <attributes>
             <nvpair id="standby-cb25eedb-6f51-4c75-b137-ec375e253890" 
name="standby" value="off"/>
           </attributes>
         </instance_attributes>
       </node>
     </nodes>
     <resources>
       <group id="MyGroup" collocated="true" ordered="true">
         <primitive id="slink_ipaddr2" class="ocf" type="IPaddr2" 
provider="heartbeat">
           <instance_attributes id="slink_ipaddr2_instance_attrs">
             <attributes>
               <nvpair id="74461f56-ba60-47f2-a767-ffd114562363" name="ip" 
value="192.168.1.222"/>
             </attributes>
           </instance_attributes>
           <operations>
             <op id="46cb04c6-a824-4e67-b514-a9bf8fce4525" name="monitor" 
interval="30s" timeout="20s" start_delay="5s"/>
           </operations>
         </primitive>
         <primitive id="slink_db" class="ocf" type="pgsql" provider="heartbeat">
           <meta_attributes id="slink_db_meta_attrs">
             <attributes/>
           </meta_attributes>
           <operations>
             <op id="45d14088-f223-4262-b309-713b0c850e77" name="monitor" 
interval="30" timeout="30" start_delay="10" disabled="false" role="Started"/>
           </operations>
         </primitive>
         <primitive id="slink_jboss" class="ocf" type="jbossocf" 
provider="enexity">
           <instance_attributes id="slink_jboss_instance_attrs">
             <attributes/>
           </instance_attributes>
           <meta_attributes id="slink_jboss_meta_attrs">
             <attributes/>
           </meta_attributes>
           <operations>
             <op id="7c63f880-8b00-4453-87a4-bf722a1bb95f" name="monitor" 
interval="30" timeout="20" start_delay="1m"/>
           </operations>
         </primitive>
         <meta_attributes id="MyGroup_meta_attrs">
           <attributes>
             <nvpair id="MyGroup_metaattr_resource_failure_stickiness" 
name="resource_failure_stickiness" value="-100"/>
           </attributes>
         </meta_attributes>
       </group>
     </resources>
     <constraints>
       <rsc_location id="run_MyGroup_group" rsc="MyGroup">
         <rule id="pref_run_MyGroup_group" score="100">
           <expression id="keep_group_on_master" attribute="#uname" 
operation="eq" value="slinkmaster"/>
         </rule>
       </rsc_location>
     </constraints>
   </configuration>
 </cib>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to