Naturally after I send that email I find that I am wrong. I was also using an enum field, which was the culprit.
On Tue, Jan 10, 2012 at 6:13 PM, William Kinney <[email protected]>wrote: > I'm (unfortunately) aware of this and this isn't the issue. My key object > contains only long, int and String values. > > The job map output is consistent, but the reduce input groups and values > for the key vary from one job to the next on the same input. It's like it > isn't properly comparing and partitioning the keys. > > I have properly implemented a hashCode(), equals() and the > WritableComparable methods. > > Also not surprisingly when I use 1 reduce task, the output is correct. > > > On Tue, Jan 10, 2012 at 5:58 PM, W.P. McNeill <[email protected]> wrote: > >> The Hadoop framework reuses Writable objects for key and value arguments, >> so if your code stores a pointer to that object instead of copying it you >> can find yourself with mysterious duplicate objects. This has tripped me >> up a number of times. Details on what exactly I encountered and how I >> fixed >> it are here >> >> http://cornercases.wordpress.com/2011/03/14/serializing-complex-mapreduce-keys/ >> and >> here >> >> http://cornercases.wordpress.com/2011/08/18/hadoop-object-reuse-pitfall-all-my-reducer-values-are-the-same/ >> . >> > >
