[
https://issues.apache.org/jira/browse/MAPREDUCE-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924880#action_12924880
]
Ravi Gummadi commented on MAPREDUCE-2135:
-----------------------------------------
Unfortunately, spill file is created using FileSystemObject.create() method. So
spills are contributing to File_BYTES_WRITTEN counter.
Here are the counters of map task of wordcount example for the same input
data(1 map task, 1 reduce task in both jobs) with 2 different values of
io.sort.mb:
(1) io.sort.mb=50
FileSystemCounters
FILE_BYTES_READ 39,268,167
FILE_BYTES_WRITTEN 78,536,360
HDFS_BYTES_READ 20,971,627
Map-Reduce Framework
Combine input records 402,994
Combine output records 383,997
CPU_MILLISECONDS 7,190
Failed Shuffles 0
GC time elapsed (ms) 62
Map input records 163,676
Map output bytes 38,559,489
Map output records 402,994
Merged Map outputs 0
PHYSICAL_MEMORY_BYTES 123,650,048
Spilled Records 767,994
SPLIT_RAW_BYTES 107
VIRTUAL_MEMORY_BYTES 582,451,200
(2) io.sort.mb=1
FileSystemCounters
FILE_BYTES_READ 75,090,212
FILE_BYTES_WRITTEN 114,350,720
HDFS_BYTES_READ 20,971,627
Map-Reduce Framework
Combine input records 796,792
Combine output records 777,203
CPU_MILLISECONDS 9,990
Failed Shuffles 0
GC time elapsed (ms) 72
Map input records 163,676
Map output bytes 38,559,489
Map output records 402,994
Merged Map outputs 0
PHYSICAL_MEMORY_BYTES 76,664,832
Spilled Records 1,134,831
SPLIT_RAW_BYTES 107
VIRTUAL_MEMORY_BYTES 595,406,848
It seems combiner also gets called many times when multiple merges happen. So
combine output records is also not the correct value of map-task-output-records.
> Need a counter for map task output file size
> --------------------------------------------
>
> Key: MAPREDUCE-2135
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2135
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: task
> Reporter: Ravi Gummadi
>
> With MapReduce trunk,
> The FileSystem counter FILE_BYTES_WRITTEN is a lot less than "Map output
> bytes" counter even when map output compression is OFF. I think this
> FILE_BYTES_WRITTEN signifies the bytes written to local file system. So it
> should be more than map output bytes(in the counters shown below, 210 Vs
> 19200000). Right ?
> Here are some counters from map task of wordcount example:
> Counters for attempt_201010141448_0001_m_000000_0
> FileInputFormatCounters
> BYTES_READ 9,600,000
> FileSystemCounters
> FILE_BYTES_READ 92
> FILE_BYTES_WRITTEN 210
> HDFS_BYTES_READ 9,600,107
> Map-Reduce Framework
> Combine input records 2,400,000
> Combine output records 8
> CPU_MILLISECONDS 4,810
> Failed Shuffles 0
> GC time elapsed (ms) 73
> Map input records 600,000
> Map output bytes 19,200,000
> Map output records 2,400,000
> Merged Map outputs 0
> PHYSICAL_MEMORY_BYTES 131,518,464
> Spilled Records 16
> SPLIT_RAW_BYTES 107
> VIRTUAL_MEMORY_BYTES 581,021,696
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.