Lars Ellenberg <lars.ellenberg <at> linbit.com> writes: > > On Fri, Dec 10, 2010 at 03:36:05AM +0000, Preeti Jain wrote: > > Hello list, > > I am testing network failure case by removing nic cable on one node and getting > > unwanted outcomes as whole cluster gets disturbed and resource appears to move > > on different nodes until it gets stabled on one node and it is also resulting in > > failback. > > Like if i remove nic cable from node 1 then failover happens it takes some time > > to move to node 2 but when once again i plugin cable on node 1 a kind of split > > brain happens and resource take sometime to get stabled on node 1 resulting > > failback which is again not desired as it should stay on node 2... > > Every node says like other cluster nodes coming after partition > > > part of log file on node 1 after nic plugin > > heartbeat[2521]: 2010/12/08_16:50:02 CRIT: Cluster node Node2 returning after partition. > > heartbeat[2521]: 2010/12/08_16:50:02 info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain > > heartbeat[2521]: 2010/12/08_16:50:02 WARN: Deadtime value may be too small. > > heartbeat[2521]: 2010/12/08_16:50:02 info: See FAQ for information on > > tuning deadtime. > > heartbeat[2521]: 2010/12/08_16:50:02 info: URL: http://linux- ha.org/FAQ#heavy_load > > heartbeat[2521]: 2010/12/08_16:50:02 info: Link Node2:eth0 up. > > heartbeat[2521]: 2010/12/08_16:50:02 WARN: Late heartbeat: Node Node2: interval 781870 ms > > heartbeat[2521]: 2010/12/08_16:50:02 info: Status update for node Node2: status active > > > Any solution for this problem... > > Good idea: follow the links given in the log messages above. > > As stated there, > > Simple solution: > multiple independent communication paths. > Thorough solution: > multiple independent communication paths, > with redundancy within each path, > plus some _independent_ stonith method. > Thanks for your reply Lars
I will try to implement as you suggested, but is there any other way like any parameter in ha.cf or in cib to avoid this problem. Like we have Pingd resource agent for fail over in case of external network connectivity loss, can we avoid it using this as right now i am trying with this but again resources moving back to previous node. If it is possible with it then please tell me how to stick resources on the node on which they are currently running i.e node 2. waiting for reply..... Regards, Preeti _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
