Harsh If this is the case I don't understand something. If I see FILE_BYTES_READ to be non zero for a map, the only thing I can assume is that it came from a spill during sort phase. I have a 10 node cluster, and I ran TeraSort with size 100,000 Bytes ( 1000 records). My io.sort.mb is 300 and io.sort.factor is 10. My mapred.child.java.opts is set to -Xmx512m. When I run this I expected given that I have everything that fits into memory, that there will be no FILE_BYTES_READ on the map side and no FILE_BYTES_WRITTEN on the redcue side. But I find that my FILE_BYTES_READ on the map side is 188,604 (HDFS_BYTES_READ is 149,686) and inexplicably SPILLED_RECORDS is 1000 for both and map and reduce. So my questions have become two. 1. Why is my spill count 1000. Given that io.sort.factor and io.sort.mb are 10 and 300 MB and I have 512MB for each task? 2. Where are the numbers for FILE_BYTES_READ/WRITTEN coming from? TIA Raj From: Harsh J <[email protected]> To: [email protected]; R V <[email protected]> Sent: Thursday, July 28, 2011 12:03 AM Subject: Re: File System Counters.
Raj, There is no overlap. Data read from HDFS FileSystem instances go to HDFS_BYTES_READ, and data read from Local FileSystem instances go to FILE_BYTES_READ. These are two different FileSystems, and have no overlap at all. On Thu, Jul 28, 2011 at 5:56 AM, R V <[email protected]> wrote: > Hello > > I don't know if the question has been answered. I am trying to understand > the overlap between FILE_BYTES_READ and HDFS_BYTES_READ. What are the various > components that provide value to this counter? For example when I see > FILE_BYTES_READ for a specific task ( Map or Reduce ) , is it purely due to > the spill during sort phase? If a HDFS read happens on a non local node, does > the counter increase on the node where the data block resides? What happens > when the data is local? does the counter increase for both HDFS_BYTES_READ > and FILE_BYTES_READ? From the values I am seeing, this looks to be the case > but I am not sure. > > I am not very fluent in Java , and hence I don't fully understand the source > . :-( > > Raj -- Harsh J
