[ 
https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-434:
------------------------------------

    Attachment: MAPREDUCE-434.4.patch

Attaching a new patch that fixes TestJobCounters. TestJobCounters tracks the 
number of spilled records; the jobs "A", "B", and "C" were off by 16K, 32K, and 
24K respectively in their previous values vs. current ones.

I believe that the reason for this is that when the reducer reads records from 
a disk file that increases the spilled records counter; previously, the 
localjobrunner copied map output files to the reducer and then ran the merge, 
reading all those records in on the "reduce side." The new logic uses the 
LocalFetcher which fetches all records from the "map side" to memory on the 
reduce side. In jobs A and B, the difference in counter values is exactly the 
number of records emitted by the combiner -- suggesting that those records were 
previously double-counted, but now are counted only once (correctly). Job C is 
harder for me to understand because it involves 5 map tasks and thus has a 
multi-level merge (io sort factor=2), but I think the difference is benign. If 
someone more familiar with the merge counters would take a look at this, I'd 
appreciate it.



> local map-reduce job limited to single reducer
> ----------------------------------------------
>
>                 Key: MAPREDUCE-434
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: local job tracker
>            Reporter: Yoram Arnon
>            Assignee: Aaron Kimball
>            Priority: Minor
>         Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, 
> MAPREDUCE-434.4.patch, MAPREDUCE-434.patch
>
>
> when mapred.job.tracker is set to 'local', my setNumReduceTasks call is 
> ignored, and the number of reduce tasks is set at 1.
> This prevents me from locally debugging my partition function, which tries to 
> partition based on the number of reduce tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to