Semantics of map.input.bytes is not consistent
----------------------------------------------

                 Key: MAPREDUCE-1917
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1917
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: task
         Environment: All
            Reporter: Milind Bhandarkar
            Assignee: Arun C Murthy


map.input.bytes counter is updated by RecordReader. For sequence files, it is 
the size of the raw data, which may be compressed. For text files, it is the 
size of uncompressed data. For PigStorage, it is always 0. This request is to 
have a consistent semantics for this counter. Since HDFS_BYTES_READ already 
shows the raw split size read by the mapper, MAP_INPUT_BYTES should be the size 
of uncompressed data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to