On Wed, Jun 29, 2011 at 01:02, Matei Zaharia <[email protected]> wrote:
> Ideally, to evaluate whether you want to go for 10GbE NICs, you would profile 
> your target Hadoop workload and see whether it's communication-bound. Hadoop 
> jobs can definitely be communication-bound if you shuffle a lot of data 
> between map and reduce, but I've also seen a lot of clusters that are 
> CPU-bound (due to decompression, running python, or just running expensive 
> user code) or disk-IO-bound. You might be surprised at what your bottleneck 
> is.

>From my experience, jobs that shuffle lots of data are also very often
slowed down by the sort phase, compressing mappers' output is a first
step to improve performance. Given the cost of a 10GbE infrastructure
with no oversubscription I'd monitor bandwith usage very closely prior
to investing in that kind of network gear.

Reply via email to