[ https://issues.apache.org/jira/browse/HADOOP-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Devaraj Das updated HADOOP-1105: -------------------------------- Attachment: 1105.new1.patch Thanks Owen for pointing out the flaw with the patch!! I was too concentrated on the performance and lost track of the functionality. Anyway here's the updated patch. Some main points about the patch: 1) Introduces a way for the progress reporting thread to block if there is no status to report. The calls to informReduceProgress wakes up the thread if it was blocked. 2) Makes sure that the progress reporting thread is killed before the task exits. For this finally clause has been introduced in run(conf, umbilical). This caused quite a lot of indentation changes (making the patch seemingly complicated, sigh). 3) Overides the method getReporter from Task.java. This is because in ReduceTask, setStatus behaves slightly differently. This ensures that MapTask's Reporter object behaves as it used to earlier (as an aside, the MapTask's Reporter.setStatus also needs to be tweaked on similar lines as the ReduceTask's, but think it could be a separate issue by itself). > Reducers don't make "progress" while iterating through values > ------------------------------------------------------------- > > Key: HADOOP-1105 > URL: https://issues.apache.org/jira/browse/HADOOP-1105 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.12.0 > Reporter: Owen O'Malley > Assigned To: Devaraj Das > Fix For: 0.12.3 > > Attachments: 1105.new1.patch, 1105.new1.patch, 1105.patch, 1105.patch > > > Reduces make progress when they go to a new key, but not when they read the > next value, which could cause reduces to time out when they have a lot of > values for the same key. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.