On 28/06/11 04:49, Segel, Mike wrote:
Hmmm. I could have sworn there was a background balancing bandwidth limiter.

There is, for the rebalancer, node outages are taken more seriously, though there have been problems in past 0.20.x where there was a risk of a cascade failure on a big switch/rack failure. The risk has been reduced, though we all await field reports to confirm this :)

You can get 12-24 TB in a server today, which means the loss of a server generates a lot of traffic -which argues for 10 Gbe.

But
-big increase in switch cost, especially if you (CoI warning) go with Cisco -there have been problems with things like BIOS PXE and lights out management on 10 Gbe -probably due to the NICs being things the BIOS wasn't expecting and off the mainboard. This should improve. -I don't know how well linux works with ether that fast (field reports useful) -the big threat is still ToR switch failure, as that will trigger a re-replication of every block in the rack.

2x1 Gbe lets you have redundant switches, albeit at the price of more wiring, more things to go wrong with the wiring, etc.

The other thing to consider is how well the "enterprise" switches work in this world -with a Hadoop cluster you can really test those claims how well the switches handle every port lighting up at full rate. Indeed, I recommend that as part of your acceptance tests for the switch.


Reply via email to