[
https://issues.apache.org/jira/browse/MAPREDUCE-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923762#action_12923762
]
Ravi Gummadi commented on MAPREDUCE-2135:
-----------------------------------------
In map task, if multiple spills happen and multiple merges happen on those
spills, then the FILE_BYTES_WRITTEN will include all those records again and
again. No ?
So the actual map task's output file(i.e. the intermediate file) content could
be of size (FILE_BYTES_WRITTEN - SpilledRecords) bytes ?
> FILE_BYTES_WRITTEN counter in map task seems incorrect
> ------------------------------------------------------
>
> Key: MAPREDUCE-2135
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2135
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: task
> Reporter: Ravi Gummadi
>
> With MapReduce trunk,
> The FileSystem counter FILE_BYTES_WRITTEN is a lot less than "Map output
> bytes" counter even when map output compression is OFF. I think this
> FILE_BYTES_WRITTEN signifies the bytes written to local file system. So it
> should be more than map output bytes(in the counters shown below, 210 Vs
> 19200000). Right ?
> Here are some counters from map task of wordcount example:
> Counters for attempt_201010141448_0001_m_000000_0
> FileInputFormatCounters
> BYTES_READ 9,600,000
> FileSystemCounters
> FILE_BYTES_READ 92
> FILE_BYTES_WRITTEN 210
> HDFS_BYTES_READ 9,600,107
> Map-Reduce Framework
> Combine input records 2,400,000
> Combine output records 8
> CPU_MILLISECONDS 4,810
> Failed Shuffles 0
> GC time elapsed (ms) 73
> Map input records 600,000
> Map output bytes 19,200,000
> Map output records 2,400,000
> Merged Map outputs 0
> PHYSICAL_MEMORY_BYTES 131,518,464
> Spilled Records 16
> SPLIT_RAW_BYTES 107
> VIRTUAL_MEMORY_BYTES 581,021,696
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.