[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924880#action_12924880
 ] 

Ravi Gummadi commented on MAPREDUCE-2135:
-----------------------------------------

Unfortunately, spill file is created using FileSystemObject.create() method. So 
spills are contributing to File_BYTES_WRITTEN counter.

Here are the counters of map task of wordcount example for the same input 
data(1 map task, 1 reduce task in both jobs) with 2 different values of 
io.sort.mb:

(1) io.sort.mb=50

FileSystemCounters
        FILE_BYTES_READ         39,268,167
        FILE_BYTES_WRITTEN      78,536,360
        HDFS_BYTES_READ         20,971,627

Map-Reduce Framework
        Combine input records   402,994
        Combine output records  383,997
        CPU_MILLISECONDS        7,190
        Failed Shuffles         0
        GC time elapsed (ms)    62
        Map input records       163,676
        Map output bytes        38,559,489
        Map output records      402,994
        Merged Map outputs      0
        PHYSICAL_MEMORY_BYTES   123,650,048
        Spilled Records         767,994
        SPLIT_RAW_BYTES         107
        VIRTUAL_MEMORY_BYTES    582,451,200

(2) io.sort.mb=1

FileSystemCounters
        FILE_BYTES_READ         75,090,212
        FILE_BYTES_WRITTEN      114,350,720
        HDFS_BYTES_READ         20,971,627

Map-Reduce Framework
        Combine input records   796,792
        Combine output records  777,203
        CPU_MILLISECONDS        9,990
        Failed Shuffles         0
        GC time elapsed (ms)    72
        Map input records       163,676
        Map output bytes        38,559,489
        Map output records      402,994
        Merged Map outputs      0
        PHYSICAL_MEMORY_BYTES   76,664,832
        Spilled Records         1,134,831
        SPLIT_RAW_BYTES         107
        VIRTUAL_MEMORY_BYTES    595,406,848

It seems combiner also gets called many times when multiple merges happen. So 
combine output records is also not the correct value of map-task-output-records.

> Need a counter for map task output file size
> --------------------------------------------
>
>                 Key: MAPREDUCE-2135
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2135
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>            Reporter: Ravi Gummadi
>
> With MapReduce trunk,
>  The FileSystem counter FILE_BYTES_WRITTEN is a lot less than "Map output 
> bytes" counter even when map output compression is OFF. I think this 
> FILE_BYTES_WRITTEN signifies the bytes written to local file system. So it 
> should be more than map output bytes(in the counters shown below, 210 Vs 
> 19200000). Right ?
> Here are some counters from map task of wordcount example:
> Counters for attempt_201010141448_0001_m_000000_0
> FileInputFormatCounters
>       BYTES_READ      9,600,000
> FileSystemCounters
>       FILE_BYTES_READ         92
>       FILE_BYTES_WRITTEN      210
>       HDFS_BYTES_READ         9,600,107
> Map-Reduce Framework
>       Combine input records   2,400,000
>       Combine output records  8
>       CPU_MILLISECONDS        4,810
>       Failed Shuffles         0
>       GC time elapsed (ms)    73
>       Map input records       600,000
>       Map output bytes        19,200,000
>       Map output records      2,400,000
>       Merged Map outputs      0
>       PHYSICAL_MEMORY_BYTES   131,518,464
>       Spilled Records         16
>       SPLIT_RAW_BYTES         107
>       VIRTUAL_MEMORY_BYTES    581,021,696

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to