Hi Stephen, The true answer depends on the types of jobs you're running. As a back of the envelope calculation I might figure something like this:
60 nodes total = 30 nodes per rack Each node might process about 100MB/sec of data In the case of a sort job where the intermediate data is the same size as the input data, that means each node needs to shuffle 100MB/sec of data In aggregate, each rack is then producing about 3GB/sec of data However, given even reducer spread across the racks, each rack will need to send 1.5GB/sec to reducers running on the other rack. Since the connection is full duplex, that means you need 1.5GB/sec of bisection bandwidth for this theoretical job. So that's 12Gbps. However, the above calculations are probably somewhat of an upper bound. A large number of jobs have significant data reduction during the map phase, either by some kind of filtering/selection going on in the Mapper itself, or by good usage of Combiners. Additionally, intermediate data compression can cut the intermediate data transfer by a significant factor. Lastly, although your disks can probably provide 100MB sustained throughput, it's rare to see a MR job which can sustain disk speed IO through the entire pipeline. So, I'd say my estimate is at least a factor of 2 too high. So, the simple answer is that 4-6Gbps is most likely just fine for most practical jobs. If you want to be extra safe, many inexpensive switches can operate in a "stacked" configuration where the bandwidth between them is essentially backplane speed. That should scale you to 96 nodes with plenty of headroom. -Todd On Tue, May 26, 2009 at 3:10 AM, stephen mulcahy <stephen.mulc...@deri.org>wrote: > Hi, > > Has anyone here investigated what level of bisection bandwidth is needed > for a Hadoop cluster which spans more than one rack? > > I'm currently sizing and planning a new Hadoop cluster and I'm wondering > what the performance implications will be if we end up with a cluster spread > across two racks. I'd expect we'll have one 48-port gigabit switch in each > 42u rack. If we end up with 60 systems spread across these two switches - > how much bandwidth should I have between the racks? > > I'll have 6 gigabit ports available for links between racks - i.e. up to 6 > Gbps. Would this be sufficient bisection bandwidth for Hadoop or should I be > considering increased bandwidth between racks (maybe using fibre links > between the switches or introducing another switch)? > > Thanks for any thoughts on this. > > -stephen > > -- > Stephen Mulcahy, DI2, Digital Enterprise Research Institute, > NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland > http://di2.deri.ie http://webstar.deri.ie http://sindice.com >