Hey guys,

Just wanted to ask, are there any sort of best practices to be followed for
hadoop shuffling improvements ?

I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 cores/CPUs
with 48 GB RAM.

I have set the following parameters :

fs.inmemory.size.mb=2000
io.sort.mb=2000
io.sort.factor=200
io.file.buffer.size=262544

mapred.map.tasks=200
mapred.reduce.tasks=40
mapred.reduce.parallel.copies=80
mapred.map.child.java.opts = 1024 Mb
mapred.map.reduce.java.opts=1024 Mb

mapred.job.tracker.handler.count=60
tasktracker.http.threads=50
mapred.job.reuse.jvm.num.tasks = -1
mapred.compress.map.output = true
mapred.reduce.slowstart.completed.maps = 0.5

mapred.tasktracker.map.tasks.maximum=24
mapred.tasktracker.reduce.tasks.maximum=12


Can anyone please validate the above tuning parameters, and suggest any
further improvements ?
My mappers are running fine. Shuffling and reducing part is comparatively
slower, than expected for normal jobs. Wanted to know what I am doing
wrong/missing.

Thanks,
Praveenesh

Reply via email to