Ted Yu wrote:
From my limited experiment, I think "Map input bytes" reflects the number of
bytes of local data file(s) when LocalJobRunner is used.

Correct me if I am wrong.
This is correct only if there is a single spill (and not multiple spills) i.e. all the map output fits in io.sort.mb.

-Ravi
On Tue, Feb 1, 2011 at 7:52 PM, Harsh J <[email protected]> wrote:

Each task counts independently of its attempt/other tasks, thereby
making the aggregates easier to control. Final counters are aggregated
only from successfully committed tasks. During the job's run, however,
counters are shown aggregated from the most successful attempts of a
task thus far.

On Wed, Feb 2, 2011 at 9:09 AM, Ted Yu <[email protected]> wrote:
If map task(s) were retried (mapred.map.max.attempts times), how would
these
two counters be affected ?

Thanks

On Tue, Feb 1, 2011 at 7:31 PM, Harsh J <[email protected]> wrote:

HDFS_BYTES_READ is a FileSystem interface counter. It directly deals
with the FS read (lower level). Map input bytes is what the
RecordReader has processed in number of bytes for records being read
from the input stream.

For plain text files, I believe both counters must report about the
same value, were entire records being read with no operation performed
on each line. But when you throw in a compressed file, you'll notice
that the HDFS_BYTES_READ would be far lesser than Map input bytes
since the disk read was low, but the total content stored in record
terms was still the same as it would be for an uncompressed file.

Hope this clears it.

On Wed, Feb 2, 2011 at 8:06 AM, Ted Yu <[email protected]> wrote:
In hadoop 0.20.2, what's the relationship between "Map input bytes"
and
HDFS_BYTES_READ ?

<counter group="FileSystemCounters"
name="HDFS_BYTES_READ">203446204073</counter>
<counter group="FileSystemCounters"
name="HDFS_BYTES_WRITTEN">23413127561</counter>
<counter group="Map-Reduce Framework" name="Map input
records">163502600</counter>
<counter group="Map-Reduce Framework" name="Spilled
Records">0</counter>
<counter group="Map-Reduce Framework" name="Map input
bytes">965922136488</counter>
<counter group="Map-Reduce Framework" name="Map output
records">296754600</counter>

Thanks


--
Harsh J
www.harshj.com


--
Harsh J
www.harshj.com


Reply via email to