That's not a question I'm qualified to answer. I do know we're now buying an Arista for a different cluster, but there's probably loads others out there.
*forwarded to general@...* ________________________________________ From: Abhishek Mehta [[email protected]] Sent: Thursday, June 30, 2011 11:38 PM To: Evert Lammerts Subject: Fwd: Hadoop Java Versions what are the other switch options (other than cisco that is?)? cheers Abhishek Mehta (e) [email protected]<mailto:[email protected]> (v) 980.355.9855 Begin forwarded message: From: Evert Lammerts <[email protected]<mailto:[email protected]>> Date: June 30, 2011 5:31:26 PM EDT To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: RE: Hadoop Java Versions Reply-To: [email protected]<mailto:[email protected]> You can get 12-24 TB in a server today, which means the loss of a server generates a lot of traffic -which argues for 10 Gbe. But -big increase in switch cost, especially if you (CoI warning) go with Cisco -there have been problems with things like BIOS PXE and lights out management on 10 Gbe -probably due to the NICs being things the BIOS wasn't expecting and off the mainboard. This should improve. -I don't know how well linux works with ether that fast (field reports useful) -the big threat is still ToR switch failure, as that will trigger a re-replication of every block in the rack. Keeping the amount of disks per node low and the amount of nodes high should keep the impact of dead nodes in control. A ToR switch failing is different - missing 30 nodes (~120TB) at once cannot be fixed by adding more nodes; that actually increases ToR switch failure. Although such failure is quite rare to begin with, I guess. The back-of-the-envelope-calculation I made suggests that ~150 (1U) nodes should be fine with 1Gb ethernet. (e.g., when 6 nodes fail in a cluster with 150 nodes with four 2TB disks each, with HDFS 60% full, it takes around ~32 minutes to recover. 2 nodes failing should take around 640 seconds. Also see the attached spreadsheet.) This doesn't take ToR switch failure in account though. On the other hand - 150 nodes is only ~5 racks - in such a scenario you might rather want to shut the system down completely rather than letting it replicate 20% of all data. Cheers, Evert
