>From what I could gather, all FileSystem instances put in an entry into a static 'statistics' map. This map is used to update the counters for each Task. Hence, all operations done on the same HDFS URI by either the task or your application code, must be counted as one. In fact, even if you are reading off another HDFS, the scheme match is alone seen, so it would aggregate to the same counter as well.
I'm not very sure of this though. Perhaps writing a simple test should be adequate to learn the truth. On Sat, Feb 26, 2011 at 1:04 AM, maha <[email protected]> wrote: > Hello, please help me clear me ideas! > > When a reducer reads map-output data remotely ... Is that reflected in the > HDFS_BYTES_READ? > > Or is HDFS_BYTES_READ/WRITTEN is only for the start and end of a job ? ie. > first data read for maps as input and last data written from reducer as > output for user to see. > > > Thank you in advance, > > Maha -- Harsh J www.harshj.com
