We at Yahoo are about to deploy code to ensure a disk failure on a datanode is just that - a disk failure. Not a node failure. This really helps avoid replication storms.
It's in the 0.20.204 branch for the curious. Arun Sent from my iPhone On Jun 28, 2011, at 3:01 AM, "Steve Loughran" <[email protected]> wrote: > On 28/06/11 04:49, Segel, Mike wrote: >> Hmmm. I could have sworn there was a background balancing bandwidth limiter. > > There is, for the rebalancer, node outages are taken more seriously, > though there have been problems in past 0.20.x where there was a risk of > a cascade failure on a big switch/rack failure. The risk has been > reduced, though we all await field reports to confirm this :) > > You can get 12-24 TB in a server today, which means the loss of a server > generates a lot of traffic -which argues for 10 Gbe. > > But > -big increase in switch cost, especially if you (CoI warning) go with > Cisco > -there have been problems with things like BIOS PXE and lights out > management on 10 Gbe -probably due to the NICs being things the BIOS > wasn't expecting and off the mainboard. This should improve. > -I don't know how well linux works with ether that fast (field reports > useful) > -the big threat is still ToR switch failure, as that will trigger a > re-replication of every block in the rack. > > 2x1 Gbe lets you have redundant switches, albeit at the price of more > wiring, more things to go wrong with the wiring, etc. > > The other thing to consider is how well the "enterprise" switches work > in this world -with a Hadoop cluster you can really test those claims > how well the switches handle every port lighting up at full rate. > Indeed, I recommend that as part of your acceptance tests for the switch. > >
