Good evening to you, Dominik. :)
I apologize for being persistent. I can work around the situations that I have
encountered via creating scripts. However, I just thought that there may be
something in the configuration that I can tweak to make it work. You have been
very helpful and that is greatly appreciated. In fact, you have resolved all
the situations I encountered, except the one that you had asked me to create a
bug report on which I would so that product will be better. Besides, you will
probably hate this project that I am working on to fall into MSCS (Microsoft
Cluster Service) as much as I will. Oooh...just the thought that the project
will resort to a Microsoft solution makes me feel like I am losing my freedom
(I certainly do not want this to happen and will try hard for this not to
happen).
I have submitted this to Bugzilla as you have recommended. It is registered as
Bug 2047.
Thank you for your support.
Regards,
jerome
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Dominik Klein
Sent: Wednesday, January 28, 2009 11:19 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Failover not working as I expected
Good morning Jerome
we should make this a daily thing, shouldn't we?
Jerome Yanga wrote:
> Dominik,
>
> I apologize for leaving resource-stickiness out. I had it there previously
> but due to the trial and errors I had performed on the crm shell, I had
> forgotten to re-add it. Nevertheless, adding it to my cib.xml file does not
> seem to work.
>
> Here is the chain of events. This happens on either Nomen or Rubric.
>
> 01) Nomen (one of the two nodes) owns the group resource, called
> Directory_Server. In the meantime, Rubric (the other node) is just there
> waiting for the resources to come to him. :)
> 02) I stop heartbeat on Nomen and the Directory_Server resource group fails
> over to Rubric.
> 03) Nomen's status changes from "running(dc)" to "stopped"
> 04) After waiting for step #3 to finish its transition, I start heartbeat
> back up in Nomen.
> 05) Nomen's status changes from "stopped" to "running-standby" to "running".
> 06) Rubric retains all the resources. However, all the resources on Rubric
> bounces/restarts when Nomen's status changes from "running-standby" to
> "running".
With the configuration you posted below, this should not happen. The
configuration looks good for what you want. If you're sure that is what
you do and get, please file a bug about that and include a hb_report.
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> Is there a way to prevent the resources in Rubric to bounce/restart when
> Nomen rejoins the cluster?
>
> Help.
>
>
>
> On the other hand, you pointed me to the right direction regarding the MailTo
> OCFAgent.
>
> This is how the variable looked like in .ocf-binaries when it was not working.
>
> rubric ~]# grep MAIL /usr/lib/ocf/resource.d/heartbeat/.ocf-binaries
> : ${MAILCMD:=}
>
> I assigned the exact path of the mail command to the variable. Now, I get
> emailed every time a failover happens. Wooot! Wooot! :)
>
> rubric ~]# grep MAIL /usr/lib/ocf/resource.d/heartbeat/.ocf-binaries
> : ${MAILCMD:=/bin/mail}
Good. I think this was on the lists earlier. Apparently a packaging issue.
Regards
Dominik
> Thanks.
>
>
> Below is my current cib.xml file.
>
> <cib admin_epoch="0" validate-with="pacemaker-1.0" crm_feature_set="3.0"
> have-quorum="1" dc-uuid="27f54ec3-b626-4b4f-b8a6-4ed0b768513c" epoch="102"
> num_updates="0" cib-last-written="Wed Jan 28 08:32:39 2009">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
> value="1.0.1-node: 6fc5ce8302abf145a02891ec41e5a492efbe8efe"/>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node id="5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" uname="nomen.esri.com"
> type="normal">
> <instance_attributes id="nodes-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e">
> <nvpair id="standby-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e"
> name="standby" value="off"/>
> </instance_attributes>
> </node>
> <node id="27f54ec3-b626-4b4f-b8a6-4ed0b768513c" uname="rubric.esri.com"
> type="normal">
> <instance_attributes id="nodes-27f54ec3-b626-4b4f-b8a6-4ed0b768513c">
> <nvpair id="standby-27f54ec3-b626-4b4f-b8a6-4ed0b768513c"
> name="standby" value="off"/>
> </instance_attributes>
> </node>
> </nodes>
> <resources>
> <group id="Directory_Server">
> <meta_attributes id="Directory_Server-meta_attributes">
> <nvpair id="Directory_Server-meta_attributes-collocated"
> name="collocated" value="true"/>
> <nvpair id="Directory_Server-meta_attributes-ordered"
> name="ordered" value="true"/>
> <nvpair id="Directory_Server-meta_attributes-migration-threshold"
> name="migration-threshold" value="1"/>
> <nvpair id="Directory_Server-meta_attributes-failure-timeout"
> name="failure-timeout" value="10s"/>
> <nvpair id="Directory_Server-meta_attributes-resource-stickiness"
> name="resource-stickiness" value="10"/>
> </meta_attributes>
> <primitive class="ocf" id="VIP" provider="heartbeat" type="IPaddr">
> <instance_attributes id="VIP-instance_attributes">
> <nvpair id="VIP-instance_attributes-ip" name="ip"
> value="10.50.26.250"/>
> </instance_attributes>
> <operations id="VIP-ops">
> <op id="VIP-monitor-5s" interval="5s" name="monitor"
> timeout="5s"/>
> </operations>
> </primitive>
> <primitive class="ocf" id="ECAS" provider="esri" type="ecas">
> <operations id="ECAS-ops">
> <op id="ECAS-monitor-3s" interval="3s" name="monitor"
> timeout="3s"/>
> </operations>
> </primitive>
> <primitive class="ocf" id="FDS_Admin" provider="esri" type="fdsadm">
> <operations id="FDS_Admin-ops">
> <op id="FDS_Admin-monitor-3s" interval="3s" name="monitor"
> timeout="3s"/>
> </operations>
> </primitive>
> <primitive class="ocf" id="Emergency_Contact" provider="heartbeat"
> type="MailTo">
> <instance_attributes id="Emergency_Contact-instance_attributes">
> <nvpair id="Emergency_Contact-instance_attributes-email"
> name="email" value="[email protected]"/>
> <nvpair id="Emergency_Contact-instance_attributes-subject"
> name="subject" value="Failover Occured"/>
> </instance_attributes>
> <operations id="Emergency_Contact-ops">
> <op id="Emergency_Contact-monitor-3s" interval="3s"
> name="monitor" timeout="3s"/>
> </operations>
> </primitive>
> </group>
> </resources>
> <constraints/>
> </configuration>
> </cib>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems