Hey all, Has anyone seen behavior where the number of reduce input records is significantly larger than the number of map output records? There's no combiner involved in the job at hand, and it's not particularly large (250GB in, about the same output). The numbers on one example job are: 2,202,290,092 map input records, 2,198,215,987 map output records, 2,200,081,377 reduce input records. The job in question had no failures or speculative task attempts killed. Running 0.18.3 on JVM 1.6.0u14.
Anyone have any thoughts? Could a broken comparator trip up the merge in such a way that it would invent records? I searched JIRA and svn logs but nothing caught my eye. If no one has seen this before I'll keep digging and certainly open a JIRA if I can find some more useful data. Thanks -Todd
