Hi folks,
I am a little puzzled by (what looks to me) is like records that I am emitting from my combiner - but that are not showing up under 'combine output records' (and seem to be disappearing). Here's some evidence: Mapred says: Combine input records 230,803,567 Combine output records 112,533,683 i am maintaining three counters and bump one of them when emitting records from the combiner (ie. The combiner emits three types of key-val pairs): COMBINERJOIN 28,264,088 COMBINERPASS 199,193,336 COMBINERKEYS 3,346,143 as can be seen - the total number of combiner outputs (sum of above three counters) is the same as the combine input records - and that is exactly what I expect from my program. However, something is going wrong somewhere and all the emitted records don't show up in the combiner output. There are no exceptions in the logs. And the output.collect() interface does not return an error code. Any ideas what's going on? Is this a pathogenic case (combiner emitting same number of output records as input records) Thanks, Joydeep