On Mon, Feb 22, 2010 at 01:40:37PM -0800, Celso K. Webber wrote: > I endorse Doug's opinion. > > Although my opinion is empiric, I can afirm that a crossover (can be a > straight cable in case of GigEthernet) is "more stable" than many Ethernet > switches out there. Not to mention that sometimes the customer has only 100 > Mbps ports, while using crossover cable you'll have GigE connections. > > > So I'd like also to ask: is there officialy any known issues about using > crossover cables instead of Ethernet switches for the private / heartbeat > network? > > Thankks, Celso. > > > > ----- Original Message ---- > From: Doug Tucker <[email protected]> > To: linux clustering <[email protected]> > Sent: Mon, February 22, 2010 4:53:29 PM > Subject: Re: [Linux-cluster] Repeated fencing > > We did. It's problematic when you need to reboot a switch or it goes > down. They can't talk and try to fence each other. Crossover cable is > a direct connection, actually far more efficient for what you are trying > to accomplish. > > > On Mon, 2010-02-22 at 11:57 -0600, Paul M. Dyer wrote: > > Crossover cable?????? > > > > With all the $$ spent, try putting a switch between the nodes. > > > > Paul > > > > ----- Original Message ----- > > From: "Doug Tucker" <[email protected]> > > To: [email protected] > > Sent: Monday, February 22, 2010 10:15:49 AM (GMT-0600) America/Chicago > > Subject: [Linux-cluster] Repeated fencing > > > > We have a 2 4.x cluster that has developed an issue we are unable to > > resolve. Starting back in December, the nodes began fencing each other > > randomly, and as frequently as once a day. There is nothing at the > > console prior to it happening, and nothing in the logs. We have not > > been able to develop any pattern to this point, the 2 nodes appear to be > > functioning fine, and suddenly in the logs a message will appear about > > "node x missed too many heartbeats" and the next thing you see is it > > fencing the node. Thinking we possibly had a hardware issue, we > > replaced both nodes from scratch with new machines, the problem > > persists. The cluster communication is done via a crossover cable on > > eth1 on both devices with private ip's. We have a 2nd cluster that is > > not having this issue, and both nodes have been up for over 160 days. > > The configuration is basically identical to the problematic cluster. > > The only difference between the 2 now is the newer hardware on the > > problematic node (prior, that was identical), and the kernel. The > > non-problematic cluster is still running kernel 89.0.9 and the > > problematic cluster is on 89.0.11. We are afraid at this point to allow > > our non problematic cluster upgrade to the latest packages. Any insight > > or advice would be greatly appreciated, we have exhausted our ideas > > here. > > > > Sincerely, > > > > Doug > > > > -- > > Linux-cluster mailing list > > [email protected] > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > [email protected] > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster
Hi Doug, maybe you can avoid this kind of problem using a quorumdisk partition. a two node cluster is split-brain prone and with a quorumdisk partition you can avoid split-brain situations, which probably is causing this behavior. So, about use a cross-over (or straight) cable, I don't know any issue about it, but, try to check if it's using full-duplex mode. half-duplex mode on cross-over linked machines probably will cause heartbeat problems. cya.. -- --- Best Regards Carlos Eduardo Maiolino Support engineer Red Hat - Global Support Services -- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
