Jeff, I've seen tests showing that 10 Gigabit Ethernet networking benefits
Hadoop clusters and the benefit is especially pronounced if Hadoop node use SSDs on the back-end. Also, as each node does bother storage I/O and processing, 10GbE NICs that offload protocol processing are especially beneficial e.g High-Performance Networking for Optimized Hadoop Deployments <http://www.chelsio.com/wp-content/uploads/2011/08/Hadoop-White-Paper-w-tuto rial-8.11.pdf> Saqib -----Original Message----- From: Jeff Kubina [mailto:[email protected]] Sent: Thursday, March 15, 2012 11:30 AM To: [email protected] Subject: Most Practical Bandwidth for a Hadoop Cluster? Suppose you have Hadoop jobs that are communication-bound (due to lots of data shuffling between maps and reduces), what is the most practical network bandwidth to strive for in such a cluster? I think it should be the sustained read bandwidth of the disks on the nodes times the number of nodes, since any more bandwidth than this could not be utilized. Agree or disagree? If you disagree, could you explain what you think it should be. Thanks.
