If map task(s) were retried (mapred.map.max.attempts times), how would these two counters be affected ?
Thanks On Tue, Feb 1, 2011 at 7:31 PM, Harsh J <[email protected]> wrote: > HDFS_BYTES_READ is a FileSystem interface counter. It directly deals > with the FS read (lower level). Map input bytes is what the > RecordReader has processed in number of bytes for records being read > from the input stream. > > For plain text files, I believe both counters must report about the > same value, were entire records being read with no operation performed > on each line. But when you throw in a compressed file, you'll notice > that the HDFS_BYTES_READ would be far lesser than Map input bytes > since the disk read was low, but the total content stored in record > terms was still the same as it would be for an uncompressed file. > > Hope this clears it. > > On Wed, Feb 2, 2011 at 8:06 AM, Ted Yu <[email protected]> wrote: > > In hadoop 0.20.2, what's the relationship between "Map input bytes" and > > HDFS_BYTES_READ ? > > > > <counter group="FileSystemCounters" > > name="HDFS_BYTES_READ">203446204073</counter> > > <counter group="FileSystemCounters" > > name="HDFS_BYTES_WRITTEN">23413127561</counter> > > <counter group="Map-Reduce Framework" name="Map input > > records">163502600</counter> > > <counter group="Map-Reduce Framework" name="Spilled Records">0</counter> > > <counter group="Map-Reduce Framework" name="Map input > > bytes">965922136488</counter> > > <counter group="Map-Reduce Framework" name="Map output > > records">296754600</counter> > > > > Thanks > > > > > > -- > Harsh J > www.harshj.com >
