Ravi: Can you illustrate the situation where map output doesn't fit in io.sort.mb ?
Thanks On Thu, Feb 3, 2011 at 8:14 PM, Ravi Gummadi <[email protected]> wrote: > Ted Yu wrote: > >> From my limited experiment, I think "Map input bytes" reflects the number >> of >> bytes of local data file(s) when LocalJobRunner is used. >> >> Correct me if I am wrong. >> >> > This is correct only if there is a single spill (and not multiple spills) > i.e. all the map output fits in io.sort.mb. > > -Ravi > > On Tue, Feb 1, 2011 at 7:52 PM, Harsh J <[email protected]> wrote: >> >> >> >>> Each task counts independently of its attempt/other tasks, thereby >>> making the aggregates easier to control. Final counters are aggregated >>> only from successfully committed tasks. During the job's run, however, >>> counters are shown aggregated from the most successful attempts of a >>> task thus far. >>> >>> On Wed, Feb 2, 2011 at 9:09 AM, Ted Yu <[email protected]> wrote: >>> >>> >>>> If map task(s) were retried (mapred.map.max.attempts times), how would >>>> >>>> >>> these >>> >>> >>>> two counters be affected ? >>>> >>>> Thanks >>>> >>>> On Tue, Feb 1, 2011 at 7:31 PM, Harsh J <[email protected]> wrote: >>>> >>>> >>>> >>>>> HDFS_BYTES_READ is a FileSystem interface counter. It directly deals >>>>> with the FS read (lower level). Map input bytes is what the >>>>> RecordReader has processed in number of bytes for records being read >>>>> from the input stream. >>>>> >>>>> For plain text files, I believe both counters must report about the >>>>> same value, were entire records being read with no operation performed >>>>> on each line. But when you throw in a compressed file, you'll notice >>>>> that the HDFS_BYTES_READ would be far lesser than Map input bytes >>>>> since the disk read was low, but the total content stored in record >>>>> terms was still the same as it would be for an uncompressed file. >>>>> >>>>> Hope this clears it. >>>>> >>>>> On Wed, Feb 2, 2011 at 8:06 AM, Ted Yu <[email protected]> wrote: >>>>> >>>>> >>>>>> In hadoop 0.20.2, what's the relationship between "Map input bytes" >>>>>> >>>>>> >>>>> and >>> >>> >>>> HDFS_BYTES_READ ? >>>>>> >>>>>> <counter group="FileSystemCounters" >>>>>> name="HDFS_BYTES_READ">203446204073</counter> >>>>>> <counter group="FileSystemCounters" >>>>>> name="HDFS_BYTES_WRITTEN">23413127561</counter> >>>>>> <counter group="Map-Reduce Framework" name="Map input >>>>>> records">163502600</counter> >>>>>> <counter group="Map-Reduce Framework" name="Spilled >>>>>> >>>>>> >>>>> Records">0</counter> >>> >>> >>>> <counter group="Map-Reduce Framework" name="Map input >>>>>> bytes">965922136488</counter> >>>>>> <counter group="Map-Reduce Framework" name="Map output >>>>>> records">296754600</counter> >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Harsh J >>>>> www.harshj.com >>>>> >>>>> >>>>> >>>> >>> -- >>> Harsh J >>> www.harshj.com >>> >>> >>> >> >
