How do I diagnose IO bounded errors using the framework counters?

W.P. McNeill Thu, 29 Sep 2011 16:15:31 -0700

I have a problem where certain Hadoop jobs take prohibitively long to run.
My hypothesis is that I am generating more I/O than my cluster can handle
and I need to substantiate this. I am looking closely at the Map Reduce
framework counters because I think they contain the information I need, but
I don't understand what the various File System Counters are telling me. Is
there a pointer to an list of exactly what all these counters mean? (So far
my online research has only turned up other people asking the same
question.)


In particular, I suspect that my mapper job--which may write multiple <key,
value> pairs for each one it receives--is writing too many and the values
are too large, but I'm not sure how to test this quantitatively.

Specific questions:

   1. I assume "Map input records" is the total of all <key, value> pairs
   coming into the mappers and "Map output records" is the total of all <key,
   value> pairs written by the mapper. Is this correct?
   2. What is "Map output bytes"? Is this the total number of bytes in all
   the <key, value> pairs written by the mapper?
   3. How would I calculate a corresponding "Map input bytes"? Why doesn't
   that counter exist?
   4. What is the relationship between the FILE|HDFS_BYTES_READ|WRITTEN
   counters? What exactly do they mean, and how do they relate to the "Map
   output bytes" counter?
   5. Sometimes the FILE bytes read and written values are an order of
   magnitude larger than the corresponding HDFS values, and sometimes it's the
   other way around. How do I go about interpreting this?

How do I diagnose IO bounded errors using the framework counters?

Reply via email to