[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924925#action_12924925
 ] 

Ravi Gummadi commented on MAPREDUCE-2154:
-----------------------------------------

The problem reported is not there for the attached json file.
But the problem will be there when there are multiple spills happened in map 
tasks.
Because the MapReduce counter "Map output bytes" is representing the bytes 
before combiner is applied, gridmix doesn't have a way to get the actual 
map-output-file-size of original job's map task. This is leading to bigger 
sized map-output-file in the simulated jobs. See MR-2135 for more details of 
the MapReduce framework issue.

> Gridmix mapper doesn't emit the correct map output records while comparing 
> with json file.
> ------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2154
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2154
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/gridmix
>            Reporter: Vinay Kumar Thota
>            Assignee: Ranjit Mathew
>         Attachments: wordcount.json
>
>
> I ran Gridmix with a trace file and compared the job history information 
> against the trace after completion of job. The map output records in a job 
> history have not matched with the map output records in a trace file.  For 
> reproducing the issue, please download the attached trace file and run the 
> gridmix. Later compare the map output records in a job history with a trace 
> file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to