Hello list, I am testing network failure case by removing nic cable on one node and getting unwanted outcomes as whole cluster gets disturbed and resource appears to move on different nodes until it gets stabled on one node and it is also resulting in failback. Like if i remove nic cable from node 1 then failover happens it takes some time to move to node 2 but when once again i plugin cable on node 1 a kind of split brain happens and resource take sometime to get stabled on node 1 resulting failback which is again not desired as it should stay on node 2... Every node says like other cluster nodes coming after partition
part of log file on node 1 after nic plugin heartbeat[2521]: 2010/12/08_16:50:02 CRIT: Cluster node Node2 returning after partition. heartbeat[2521]: 2010/12/08_16:50:02 info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain heartbeat[2521]: 2010/12/08_16:50:02 WARN: Deadtime value may be too small. heartbeat[2521]: 2010/12/08_16:50:02 info: See FAQ for information on tuning deadtime. heartbeat[2521]: 2010/12/08_16:50:02 info: URL: http://linux- ha.org/FAQ#heavy_load heartbeat[2521]: 2010/12/08_16:50:02 info: Link Node2:eth0 up. heartbeat[2521]: 2010/12/08_16:50:02 WARN: Late heartbeat: Node Node2: interval 781870 ms heartbeat[2521]: 2010/12/08_16:50:02 info: Status update for node Node2: status active heartbeat[2521]: 2010/12/08_16:50:03 info: Link Node3:eth0 up. heartbeat[2521]: 2010/12/08_16:50:03 info: Link Node4:eth0 up. heartbeat[2521]: 2010/12/08_16:50:03 CRIT: Cluster node Node4 returning after partition. heartbeat[2521]: 2010/12/08_16:50:03 info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain heartbeat[2521]: 2010/12/08_16:50:03 WARN: Deadtime value may be too small. heartbeat[2521]: 2010/12/08_16:50:03 info: See FAQ for information on tuning deadtime. heartbeat[2521]: 2010/12/08_16:50:03 info: URL: http://linux- ha.org/FAQ#heavy_load heartbeat[2521]: 2010/12/08_16:50:03 WARN: Late heartbeat: node Node4: interval 782200 ms heartbeat[2521]: 2010/12/08_16:50:03 info: Status update for node Node4: status active heartbeat[2521]: 2010/12/08_16:50:03 info: Link Node5:eth0 up. heartbeat[2521]: 2010/12/08_16:50:04 CRIT: Cluster node Node2 returning after partition. heartbeat[2521]: 2010/12/08_16:50:04 info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain heartbeat[2521]: 2010/12/08_16:50:04 WARN: Deadtime value may be too small. heartbeat[2521]: 2010/12/08_16:50:04 info: See FAQ for information on tuning deadtime. heartbeat[2521]: 2010/12/08_16:50:04 info: URL: http://linux- ha.org/FAQ#heavy_load heartbeat[2521]: 2010/12/08_16:50:04 WARN: Late heartbeat: node Node2: interval 784380 ms heartbeat[2521]: 2010/12/08_16:50:04 info: Status update for node Node2: status active heartbeat[2521]: 2010/12/08_16:50:04 CRIT: Cluster node Node5 returning after partition. heartbeat[2521]: 2010/12/08_16:50:04 info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain heartbeat[2521]: 2010/12/08_16:50:04 WARN: Deadtime value may be too small. heartbeat[2521]: 2010/12/08_16:50:04 info: See FAQ for information on tuning deadtime. heartbeat[2521]: 2010/12/08_16:50:04 info: URL: http://linux- ha.org/FAQ#heavy_load heartbeat[2521]: 2010/12/08_16:50:04 WARN: Late heartbeat: node Node5: interval 784390 ms heartbeat[2521]: 2010/12/08_16:50:04 info: Status update for node Node5: status active Any solution for this problem... Regards Preeti _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
