Semantics of map.input.bytes is not consistent
----------------------------------------------
Key: MAPREDUCE-1917
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1917
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: task
Environment: All
Reporter: Milind Bhandarkar
Assignee: Arun C Murthy
map.input.bytes counter is updated by RecordReader. For sequence files, it is
the size of the raw data, which may be compressed. For text files, it is the
size of uncompressed data. For PigStorage, it is always 0. This request is to
have a consistent semantics for this counter. Since HDFS_BYTES_READ already
shows the raw split size read by the mapper, MAP_INPUT_BYTES should be the size
of uncompressed data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.