Lars Ellenberg <lars.ellenberg <at> linbit.com> writes:

> 
> On Fri, Dec 10, 2010 at 03:36:05AM +0000, Preeti Jain wrote:
> > Hello list,
> >  I am testing network failure case by removing nic cable on one node and 
getting 
> > unwanted outcomes as whole cluster gets disturbed and resource appears to 
move 
> > on different nodes until it gets stabled on one node and it is also 
resulting in 
> > failback.
> > Like if i remove nic cable from node 1 then failover happens it takes some 
time 
> > to move to node 2 but when once again i plugin cable on node 1 a kind of 
split 
> > brain happens and resource take sometime to get stabled on node 1 resulting 
> > failback which is again not desired as it should stay on node 2...
> > Every node says like other cluster nodes coming after partition 
> 
> > part of log file on node 1 after nic plugin
> > heartbeat[2521]: 2010/12/08_16:50:02 CRIT: Cluster node Node2 returning 
after partition.
> > heartbeat[2521]: 2010/12/08_16:50:02 info: For information on cluster 
partitions, See URL: http://linux-ha.org/SplitBrain
> > heartbeat[2521]: 2010/12/08_16:50:02 WARN: Deadtime value may be too small.
> > heartbeat[2521]: 2010/12/08_16:50:02 info: See FAQ for information on 
> > tuning 
deadtime.
> > heartbeat[2521]: 2010/12/08_16:50:02 info: URL: http://linux-
ha.org/FAQ#heavy_load
> > heartbeat[2521]: 2010/12/08_16:50:02 info: Link Node2:eth0 up.
> > heartbeat[2521]: 2010/12/08_16:50:02 WARN: Late heartbeat: Node Node2: 
interval 781870 ms
> > heartbeat[2521]: 2010/12/08_16:50:02 info: Status update for node Node2: 
status active
> 
> > Any solution for this problem...
> 
> Good idea: follow the links given in the log messages above.
> 
> As stated there,
> 
> Simple solution:
>       multiple independent communication paths.
> Thorough solution:
>       multiple independent communication paths,
>       with redundancy within each path,
>       plus some _independent_ stonith method.
> 
Thanks for your reply Lars

I will try to implement as you suggested,
but is there any other way like any parameter in ha.cf or in cib to avoid this 
problem.

Like we have Pingd resource agent for fail over in case of external network 
connectivity loss, can we avoid it using this as right now i am trying with 
this 
but again resources moving back to previous node. If it is possible with it then
please tell me how to stick resources on the node on which they are currently 
running i.e node 2.

waiting for reply.....

Regards,
Preeti


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to