Re: [Linux-HA] failover issues faced with pacemaker/corosync

Dan Frincu Fri, 25 Feb 2011 00:00:58 -0800

Hi,

On 02/24/2011 03:35 PM, Amit Jathar wrote:
> Hi,
>
> I am trying to use Pacemaker with corosync&  facing following issues.
> I want to know whether these are due to misconfiguration or these are known 
> issues.
>
> I have two nodes in the cluster :- VIP-1&  VIP-2
> The corosync version is :-
> Corosync Cluster Engine, version '1.2.7' SVN revision '3008'
>
> ==================================================================
> The crm_mon output is :-
>
> ============
> Last updated: Thu Feb 24 17:44:33 2011
> Stack: openais
> Current DC: VIP-1 - partition with quorum
> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
> 2 Nodes configured, 2 expected votes
> 3 Resources configured.
> ============
>
> Online: [ VIP-1 VIP-2 ]
>
> ClusterIP       (ocf::heartbeat:IPaddr2):       Started VIP-1
> WebSite (ocf::heartbeat:apache):        Started VIP-1
> My_Tomcat       (ocf::heartbeat:tomcat):        Started VIP-
>
> ==================================================================
> My configuration is :-
>
> [root@VIP-1 local]# crm configure show
> node VIP-1
> node VIP-2
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>          params ip="172.16.201.23" cidr_netmask="32" \
>          op monitor interval="5s"
> primitive My_Tomcat ocf:heartbeat:tomcat \
>          params catalina_home="/root/Softwares/apache-tomcat-6.0.26" 
> java_home="/root/Softwares/Java/linux/jdk1.6.0_21" \
>          op monitor interval="5s"
> primitive WebSite ocf:heartbeat:apache \
>          params configfile="/etc/httpd/conf/httpd.conf" \
>          op monitor interval="5s"
> property $id="cib-bootstrap-options" \
>          dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
>          cluster-infrastructure="openais" \
>          expected-quorum-votes="2" \
>          stonith-enabled="false" \
>          no-quorum-policy="ignore" \
>          last-lrm-refresh="1298547656"
> rsc_defaults $id="rsc-options" \
>          resource-stickiness="2"
> =======================================================================
>
>
> Issue -1)
>
> I observed that If any service is manually shutdown on the VIP-1, then the 
> corosync restarts it on the same node.
> In the logs, I can see this :-
> =================================================================================================================
> Feb 24 18:14:32 VIP-1 pengine: [28098]: info: get_failcount: My_Tomcat has 
> failed 35 times on VIP-1
> Feb 24 18:14:32 VIP-1 pengine: [28098]: notice: common_apply_stickiness: 
> My_Tomcat can fail 999965 more times on VIP-1 before being forced off
> ==================================================================================================================
>
> I have not configured to restart the service for INFINITY times on VIP-1, so 
> is this default behavior?
Yes
> Is there any configuration to tell the corosync to restart the service only 
> for two times on VIP-1&  if not started, then start it on VIP-2 ?
The closest you'll get is to specify for the migration-threshold=x on 
the resource (My_Tomcat) and after it restarts x times it will move to 
the other node. Not really sure what the exact behavior was when setting 
the value of x, I think IIRC that if you set migration-threshold=3 it 
will restart the resource twice and the third time it moves it without 
restarting it on the same node.


Clearing the failcount is done by using the 
validate-with="pacemaker-1.1" in the CIB, and only available from 
Pacemaker 1.1.3 and later, but also requires setting the 
cluster-recheck-interval to a timeout of your choice.

HTH,
Dan
> Issue -2)
>
> I have changed the error codes in the Apache&  Tomcat RA scripts,&  returned 
> the error code=2 if the monitor fails.
> Now, if I manually stop the service,  then it is not restarted on the VIP-1 
> but it is started on VIP-2.
> The fail count of that service on VIP-1 is showing as 1.
>
> Now, if I make the service manually  down on the VIP-2, then it is not 
> getting started on the VIP-1 untill I clean up the resource.
>
> So, is this known behavior or I have missed any configuration?
>
> Let me know if you need more information.
>
> Thanks,
> Amit
>
>
>
>
>
>
>
>
>
>
> ________________________________
> This email (message and any attachment) is confidential and may be 
> privileged. If you are not certain that you are the intended recipient, 
> please notify the sender immediately by replying to this message, and delete 
> all copies of this message and attachments. Any other use of this email by 
> you is prohibited.
> ________________________________
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

-- 
Dan Frincu
CCNA, RHCE

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] failover issues faced with pacemaker/corosync

Reply via email to