Hi, On 02/24/2011 03:35 PM, Amit Jathar wrote: > Hi, > > I am trying to use Pacemaker with corosync& facing following issues. > I want to know whether these are due to misconfiguration or these are known > issues. > > I have two nodes in the cluster :- VIP-1& VIP-2 > The corosync version is :- > Corosync Cluster Engine, version '1.2.7' SVN revision '3008' > > ================================================================== > The crm_mon output is :- > > ============ > Last updated: Thu Feb 24 17:44:33 2011 > Stack: openais > Current DC: VIP-1 - partition with quorum > Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 > 2 Nodes configured, 2 expected votes > 3 Resources configured. > ============ > > Online: [ VIP-1 VIP-2 ] > > ClusterIP (ocf::heartbeat:IPaddr2): Started VIP-1 > WebSite (ocf::heartbeat:apache): Started VIP-1 > My_Tomcat (ocf::heartbeat:tomcat): Started VIP- > > ================================================================== > My configuration is :- > > [root@VIP-1 local]# crm configure show > node VIP-1 > node VIP-2 > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > params ip="172.16.201.23" cidr_netmask="32" \ > op monitor interval="5s" > primitive My_Tomcat ocf:heartbeat:tomcat \ > params catalina_home="/root/Softwares/apache-tomcat-6.0.26" > java_home="/root/Softwares/Java/linux/jdk1.6.0_21" \ > op monitor interval="5s" > primitive WebSite ocf:heartbeat:apache \ > params configfile="/etc/httpd/conf/httpd.conf" \ > op monitor interval="5s" > property $id="cib-bootstrap-options" \ > dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > last-lrm-refresh="1298547656" > rsc_defaults $id="rsc-options" \ > resource-stickiness="2" > ======================================================================= > > > Issue -1) > > I observed that If any service is manually shutdown on the VIP-1, then the > corosync restarts it on the same node. > In the logs, I can see this :- > ================================================================================================================= > Feb 24 18:14:32 VIP-1 pengine: [28098]: info: get_failcount: My_Tomcat has > failed 35 times on VIP-1 > Feb 24 18:14:32 VIP-1 pengine: [28098]: notice: common_apply_stickiness: > My_Tomcat can fail 999965 more times on VIP-1 before being forced off > ================================================================================================================== > > I have not configured to restart the service for INFINITY times on VIP-1, so > is this default behavior? Yes > Is there any configuration to tell the corosync to restart the service only > for two times on VIP-1& if not started, then start it on VIP-2 ? The closest you'll get is to specify for the migration-threshold=x on the resource (My_Tomcat) and after it restarts x times it will move to the other node. Not really sure what the exact behavior was when setting the value of x, I think IIRC that if you set migration-threshold=3 it will restart the resource twice and the third time it moves it without restarting it on the same node.
Clearing the failcount is done by using the validate-with="pacemaker-1.1" in the CIB, and only available from Pacemaker 1.1.3 and later, but also requires setting the cluster-recheck-interval to a timeout of your choice. HTH, Dan > Issue -2) > > I have changed the error codes in the Apache& Tomcat RA scripts,& returned > the error code=2 if the monitor fails. > Now, if I manually stop the service, then it is not restarted on the VIP-1 > but it is started on VIP-2. > The fail count of that service on VIP-1 is showing as 1. > > Now, if I make the service manually down on the VIP-2, then it is not > getting started on the VIP-1 untill I clean up the resource. > > So, is this known behavior or I have missed any configuration? > > Let me know if you need more information. > > Thanks, > Amit > > > > > > > > > > > ________________________________ > This email (message and any attachment) is confidential and may be > privileged. If you are not certain that you are the intended recipient, > please notify the sender immediately by replying to this message, and delete > all copies of this message and attachments. Any other use of this email by > you is prohibited. > ________________________________ > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems -- Dan Frincu CCNA, RHCE _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
