Re: WritableComparable and the case of duplicate keys in the reducer

William Kinney Tue, 10 Jan 2012 15:15:58 -0800

Naturally after I send that email I find that I am wrong. I was also using
an enum field, which was the culprit.


On Tue, Jan 10, 2012 at 6:13 PM, William Kinney <[email protected]>wrote:

> I'm (unfortunately) aware of this and this isn't the issue. My key object
> contains only long, int and String values.
>
> The job map output is consistent, but the reduce input groups and values
> for the key vary from one job to the next on the same input. It's like it
> isn't properly comparing and partitioning the keys.
>
> I have properly implemented a hashCode(), equals() and the
> WritableComparable methods.
>
> Also not surprisingly when I use 1 reduce task, the output is correct.
>
>
> On Tue, Jan 10, 2012 at 5:58 PM, W.P. McNeill <[email protected]> wrote:
>
>> The Hadoop framework reuses Writable objects for key and value arguments,
>> so if your code stores a pointer to that object instead of copying it you
>> can find yourself with mysterious duplicate objects.  This has tripped me
>> up a number of times. Details on what exactly I encountered and how I
>> fixed
>> it are here
>>
>> http://cornercases.wordpress.com/2011/03/14/serializing-complex-mapreduce-keys/
>> and
>> here
>>
>> http://cornercases.wordpress.com/2011/08/18/hadoop-object-reuse-pitfall-all-my-reducer-values-are-the-same/
>>  .
>>
>
>

Re: WritableComparable and the case of duplicate keys in the reducer

Reply via email to