All, I have come across a situation that I don't understand.
*First Reducer: *Behold the first of two reducers. A fragment of it's output follows. Simple no? It doesn't do anything. I've highlighted two records from the output. Keep them in mind. Now lets look at the second reducer. * *protected void reduce(Text key, Iterable<Text> visitors, Context ctx) throws IOException, InterruptedException { for (Text visitor : visitors) { ctx.write(key, visitor); } } 2005-09-16=33614 42340108 *more==>* 2005-09-16=33614 42340106 *more==>* *2005-09-16=33614 42340113 more==>* 2005-09-16=44135 42324490 *more==>* 2005-09-16=44135 42339700 *more==>* ... *2005-09-16=44135 42324489 more==>* *Second Reducer:* This is a variation on the reducer from above. A fragment of it's output follows. The difference is I add all visitors to a list then I iterate through the list to produce my output. Remember the two highlighted records from above? They are now showing up in the output as duplicates and the other records appear to be missing. Why? I have never seen an ArrayList behave like this. It must have something to do with hadoop. I have a reasons for using the list. One such reason is that I must have a full count of all visitors before I can do my output, but I spare you. To my mind, this second reducer should output the same as the first. protected void reduce(Text key, Iterable<Text> visitors, Context ctx) throws IOException, InterruptedException { List<Text> list = new ArrayList<Text>(); for (Text visitor : visitors) { list.add(visitor); } for (Text visitor : list) { ctx.write(key, visitor); } } 2005-09-16=33614 42340113 *more==>* 2005-09-16=33614 42340113 *more==>* 2005-09-16=33614 42340113 *more==>* 2005-09-16=44135 42324489 *more==>* 2005-09-16=44135 42324489 *more==>* Thanks in advance -- Geoffry Roberts