[ https://issues.apache.org/jira/browse/HADOOP-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544999 ]
Devaraj Das commented on HADOOP-1965: ------------------------------------- Some indentation needs to be fixed (the patch has quite a few lines where the only change is the indentation for the second line of a code statement). Also, some documentation should be put around the fact that there are two buffers, one which sort works on, and another that collect works on, the switching of the buffers, etc. The benchmark assumes RandomWriter to be there in the job-jar but, since the benchmark is part of the test jar, this is not true, unless the user generates a new jar file containing the randomwriter classes. Maybe you should implement the data generation part of the benchmark within the benchmark. > Handle map output buffers better > -------------------------------- > > Key: HADOOP-1965 > URL: https://issues.apache.org/jira/browse/HADOOP-1965 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.16.0 > Reporter: Devaraj Das > Assignee: Amar Kamat > Fix For: 0.16.0 > > Attachments: 1965_single_proc_150mb_gziped.jpeg, > 1965_single_proc_150mb_gziped.pdf, 1965_single_proc_150mb_gziped_breakup.png, > HADOOP-1965-1.patch, HADOOP-1965-Benchmark.patch > > > Today, the map task stops calling the map method while sort/spill is using > the (single instance of) map output buffer. One improvement that can be done > to improve performance of the map task is to have another buffer for writing > the map outputs to, while sort/spill is using the first buffer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.