Can anyone please eyeball the config parameters as defined below and share their thoughts on this ?
Thanks, Praveenesh On Mon, Jan 30, 2012 at 6:20 PM, praveenesh kumar <praveen...@gmail.com>wrote: > Hey guys, > > Just wanted to ask, are there any sort of best practices to be followed > for hadoop shuffling improvements ? > > I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 cores/CPUs > with 48 GB RAM. > > I have set the following parameters : > > fs.inmemory.size.mb=2000 > io.sort.mb=2000 > io.sort.factor=200 > io.file.buffer.size=262544 > > mapred.map.tasks=200 > mapred.reduce.tasks=40 > mapred.reduce.parallel.copies=80 > mapred.map.child.java.opts = 1024 Mb > mapred.map.reduce.java.opts=1024 Mb > > mapred.job.tracker.handler.count=60 > tasktracker.http.threads=50 > mapred.job.reuse.jvm.num.tasks = -1 > mapred.compress.map.output = true > mapred.reduce.slowstart.completed.maps = 0.5 > > mapred.tasktracker.map.tasks.maximum=24 > mapred.tasktracker.reduce.tasks.maximum=12 > > > Can anyone please validate the above tuning parameters, and suggest any > further improvements ? > My mappers are running fine. Shuffling and reducing part is comparatively > slower, than expected for normal jobs. Wanted to know what I am doing > wrong/missing. > > Thanks, > Praveenesh > >