I have a problem where certain Hadoop jobs take prohibitively long to run. My hypothesis is that I am generating more I/O than my cluster can handle and I need to substantiate this. I am looking closely at the Map Reduce framework counters because I think they contain the information I need, but I don't understand what the various File System Counters are telling me. Is there a pointer to an list of exactly what all these counters mean? (So far my online research has only turned up other people asking the same question.)
In particular, I suspect that my mapper job--which may write multiple <key, value> pairs for each one it receives--is writing too many and the values are too large, but I'm not sure how to test this quantitatively. Specific questions: 1. I assume "Map input records" is the total of all <key, value> pairs coming into the mappers and "Map output records" is the total of all <key, value> pairs written by the mapper. Is this correct? 2. What is "Map output bytes"? Is this the total number of bytes in all the <key, value> pairs written by the mapper? 3. How would I calculate a corresponding "Map input bytes"? Why doesn't that counter exist? 4. What is the relationship between the FILE|HDFS_BYTES_READ|WRITTEN counters? What exactly do they mean, and how do they relate to the "Map output bytes" counter? 5. Sometimes the FILE bytes read and written values are an order of magnitude larger than the corresponding HDFS values, and sometimes it's the other way around. How do I go about interpreting this?
