Re: "Map input bytes" vs HDFS_BYTES_READ

Ted Yu Thu, 03 Feb 2011 20:34:49 -0800

Ravi:
Can you illustrate the situation where map output doesn't fit in io.sort.mb
?


Thanks

On Thu, Feb 3, 2011 at 8:14 PM, Ravi Gummadi <[email protected]> wrote:

> Ted Yu wrote:
>
>> From my limited experiment, I think "Map input bytes" reflects the number
>> of
>> bytes of local data file(s) when LocalJobRunner is used.
>>
>> Correct me if I am wrong.
>>
>>
> This is correct only if there is a single spill (and not multiple spills)
> i.e. all the map output fits in io.sort.mb.
>
> -Ravi
>
>  On Tue, Feb 1, 2011 at 7:52 PM, Harsh J <[email protected]> wrote:
>>
>>
>>
>>> Each task counts independently of its attempt/other tasks, thereby
>>> making the aggregates easier to control. Final counters are aggregated
>>> only from successfully committed tasks. During the job's run, however,
>>> counters are shown aggregated from the most successful attempts of a
>>> task thus far.
>>>
>>> On Wed, Feb 2, 2011 at 9:09 AM, Ted Yu <[email protected]> wrote:
>>>
>>>
>>>> If map task(s) were retried (mapred.map.max.attempts times), how would
>>>>
>>>>
>>> these
>>>
>>>
>>>> two counters be affected ?
>>>>
>>>> Thanks
>>>>
>>>> On Tue, Feb 1, 2011 at 7:31 PM, Harsh J <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>>> HDFS_BYTES_READ is a FileSystem interface counter. It directly deals
>>>>> with the FS read (lower level). Map input bytes is what the
>>>>> RecordReader has processed in number of bytes for records being read
>>>>> from the input stream.
>>>>>
>>>>> For plain text files, I believe both counters must report about the
>>>>> same value, were entire records being read with no operation performed
>>>>> on each line. But when you throw in a compressed file, you'll notice
>>>>> that the HDFS_BYTES_READ would be far lesser than Map input bytes
>>>>> since the disk read was low, but the total content stored in record
>>>>> terms was still the same as it would be for an uncompressed file.
>>>>>
>>>>> Hope this clears it.
>>>>>
>>>>> On Wed, Feb 2, 2011 at 8:06 AM, Ted Yu <[email protected]> wrote:
>>>>>
>>>>>
>>>>>> In hadoop 0.20.2, what's the relationship between "Map input bytes"
>>>>>>
>>>>>>
>>>>> and
>>>
>>>
>>>> HDFS_BYTES_READ ?
>>>>>>
>>>>>> <counter group="FileSystemCounters"
>>>>>> name="HDFS_BYTES_READ">203446204073</counter>
>>>>>> <counter group="FileSystemCounters"
>>>>>> name="HDFS_BYTES_WRITTEN">23413127561</counter>
>>>>>> <counter group="Map-Reduce Framework" name="Map input
>>>>>> records">163502600</counter>
>>>>>> <counter group="Map-Reduce Framework" name="Spilled
>>>>>>
>>>>>>
>>>>> Records">0</counter>
>>>
>>>
>>>> <counter group="Map-Reduce Framework" name="Map input
>>>>>> bytes">965922136488</counter>
>>>>>> <counter group="Map-Reduce Framework" name="Map output
>>>>>> records">296754600</counter>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>> www.harshj.com
>>>>>
>>>>>
>>>>>
>>>>
>>> --
>>> Harsh J
>>> www.harshj.com
>>>
>>>
>>>
>>
>

Re: "Map input bytes" vs HDFS_BYTES_READ

Reply via email to