[ https://issues.apache.org/jira/browse/HADOOP-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496097 ]
Devaraj Das commented on HADOOP-1201: ------------------------------------- Makes sense. Maybe HADOOP-1235 should be merged with this issue. > Progress reporting can be improved for both Map/Reduce tasks > ------------------------------------------------------------ > > Key: HADOOP-1201 > URL: https://issues.apache.org/jira/browse/HADOOP-1201 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Reporter: Devaraj Das > > Both the map and reduce tasks do progress reporting in separate threads. > However, in the ReduceTask, after the sort phase, the progress reporting > happens inline with the reducer invocations. This slows down the Reduce phase > since RPC is involved for every progress report. The better thing to do would > be to do progress reporting for all phases in separate threads and have the > tasks just update the progress fields. > One proposal is to extract out the reporting stuff that is there in > MapTask/ReduceTask and put it in the Task superclass as a new class, and have > methods in the new class that control what/when progress is reported. > Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.