Re: [Linux-cluster] Repeated fencing

Carlos Maiolino Mon, 22 Feb 2010 14:42:10 -0800

On Mon, Feb 22, 2010 at 01:40:37PM -0800, Celso K. Webber wrote:
> I endorse Doug's opinion.
> 
> Although my opinion is empiric, I can afirm that a crossover (can be a 
> straight cable in case of GigEthernet) is "more stable" than many Ethernet 
> switches out there. Not to mention that sometimes the customer has only 100 
> Mbps ports, while using crossover cable you'll have GigE connections.
> 
> 
> So I'd like also to ask: is there officialy any known issues about using 
> crossover cables instead of Ethernet switches for the private / heartbeat 
> network?
> 
> Thankks, Celso.
> 
> 
> 
> ----- Original Message ----
> From: Doug Tucker <[email protected]>
> To: linux clustering <[email protected]>
> Sent: Mon, February 22, 2010 4:53:29 PM
> Subject: Re: [Linux-cluster] Repeated fencing
> 
> We did.  It's problematic when you need to reboot a switch or it goes
> down.  They can't talk and try to fence each other.  Crossover cable is
> a direct connection, actually far more efficient for what you are trying
> to accomplish.
> 
> 
> On Mon, 2010-02-22 at 11:57 -0600, Paul M. Dyer wrote:
> > Crossover cable??????
> > 
> > With all the $$ spent, try putting a switch between the nodes.
> > 
> > Paul
> > 
> > ----- Original Message -----
> > From: "Doug Tucker" <[email protected]>
> > To: [email protected]
> > Sent: Monday, February 22, 2010 10:15:49 AM (GMT-0600) America/Chicago
> > Subject: [Linux-cluster] Repeated fencing
> > 
> > We have a 2 4.x cluster that has developed an issue we are unable to
> > resolve.  Starting back in December, the nodes began fencing each other
> > randomly, and as frequently as once a day.  There is nothing at the
> > console prior to it happening, and nothing in the logs.  We have not
> > been able to develop any pattern to this point, the 2 nodes appear to be
> > functioning fine, and suddenly in the logs a message will appear about
> > "node x missed too many heartbeats" and the next thing you see is it
> > fencing the node.  Thinking we possibly had a hardware issue, we
> > replaced both nodes from scratch with new machines, the problem
> > persists.  The cluster communication is done via a crossover cable on
> > eth1 on both devices with private ip's.  We have a 2nd cluster that is
> > not having this issue, and both nodes have been up for over 160 days.
> > The configuration is basically identical to the problematic cluster.
> > The only difference between the 2 now is the newer hardware on the
> > problematic node (prior, that was identical), and the kernel.  The
> > non-problematic cluster is still running kernel 89.0.9 and the
> > problematic cluster is on 89.0.11.  We are afraid at this point to allow
> > our non problematic cluster upgrade to the latest packages.  Any insight
> > or advice would be greatly appreciated, we have exhausted our ideas
> > here.
> > 
> > Sincerely,
> > 
> > Doug
> > 
> > --
> > Linux-cluster mailing list
> > [email protected]
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > --
> > Linux-cluster mailing list
> > [email protected]
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
>       
> 
> --
> Linux-cluster mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/linux-cluster


Hi Doug, maybe you can avoid this kind of problem using a quorumdisk partition. 
a two node cluster is split-brain prone and with a quorumdisk partition you can 
avoid split-brain situations, which probably is causing this behavior.

So, about use a cross-over (or straight) cable, I don't know any issue about 
it, but, try to check if it's using full-duplex mode. half-duplex mode on 
cross-over linked machines probably will cause heartbeat problems.

cya..
-- 
---

Best Regards

Carlos Eduardo Maiolino
Support engineer
Red Hat - Global Support Services

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Repeated fencing

Reply via email to