When in doubt, go straight to the owner of a fact. The operating system is what really knows disk i/o. "my mapper job--which may write multiple <key,value> pairs for each one it receives--is writing too many" - ah, a map-increase job :) This is what Combiners are for- to keep explosions of data from hitting the network by combining in the mapper machine.
On Thu, Sep 29, 2011 at 4:15 PM, W.P. McNeill <[email protected]> wrote: > I have a problem where certain Hadoop jobs take prohibitively long to run. > My hypothesis is that I am generating more I/O than my cluster can handle > and I need to substantiate this. I am looking closely at the Map Reduce > framework counters because I think they contain the information I need, but > I don't understand what the various File System Counters are telling me. Is > there a pointer to an list of exactly what all these counters mean? (So far > my online research has only turned up other people asking the same > question.) > > In particular, I suspect that my mapper job--which may write multiple <key, > value> pairs for each one it receives--is writing too many and the values > are too large, but I'm not sure how to test this quantitatively. > > Specific questions: > > 1. I assume "Map input records" is the total of all <key, value> pairs > coming into the mappers and "Map output records" is the total of all > <key, > value> pairs written by the mapper. Is this correct? > 2. What is "Map output bytes"? Is this the total number of bytes in all > the <key, value> pairs written by the mapper? > 3. How would I calculate a corresponding "Map input bytes"? Why doesn't > that counter exist? > 4. What is the relationship between the FILE|HDFS_BYTES_READ|WRITTEN > counters? What exactly do they mean, and how do they relate to the "Map > output bytes" counter? > 5. Sometimes the FILE bytes read and written values are an order of > magnitude larger than the corresponding HDFS values, and sometimes it's > the > other way around. How do I go about interpreting this? > -- Lance Norskog [email protected]
