Thanks for your response Eric.
I am using hadoop 0.20.2.
Here is what the hashCode() implementation looks like (I actually had the IDE
generate it for me)
Main key (for mapper & reducer):
public int hashCode() {
int result = kVersion;
result = 31 * result + (aKey != null ? aKey.hashCode() : 0);
result = 31 * result + (gKey != null ? gKey.hashCode() : 0);
result = 31 * result + (int) (date ^ (date >>> 32));
result = 31 * result + (ma != null ? ma.hashCode() : 0);
result = 31 * result + (cl != null ? cl.hashCode() : 0);
return result;
}
aKey : AKey class
public int hashCode() {
int result = kVersion;
result = 31 * result + (v != null ? v.hashCode() : 0);
result = 31 * result + (s != null ? s.hashCode() : 0);
result = 31 * result + (o != null ? o.hashCode() : 0);
result = 31 * result + (l != null ? l.hashCode() : 0);
result = 31 * result + (e ? 1 : 0); //boolean
result = 31 * result + (li ? 1 : 0); //boolean
result = 31 * result + (aut ? 1 : 0); //boolean
return result;
}
When this happens, I do see the same values for the key. Also I am not using a
grouping comparator.
I was wondering since the call to HashPartitioner.getPartition() is done from a
map task, several of which are running on different machines, is it possible
that they get a different hashcode and hence get different reducers assigned
even when the key is the same.
Thanks,
Deepika
-----Original Message-----
From: Eric Sammer [mailto:[email protected]]
Sent: Monday, May 24, 2010 3:07 PM
To: [email protected]
Subject: Re: Hash Partitioner
Deepika:
That sounds very strange. Can you let us know what version of Hadoop
(e.g. Apache 0.20.x, CDH2, etc.) you're running and a bit more about
your hashCode() implementation? When this happens, do you see the same
values for the duplicate key? Did you also implement a grouping
comparator?
The hash partitioner is extremely simple. It basically does
key.hashCode() % numberOfReduces = partition number to which a key is
assigned. If one incorrectly implements a grouping comparator, it's
possible you could see odd behavior, though.
On Mon, May 24, 2010 at 5:35 PM, Deepika Khera <[email protected]> wrote:
> Hi,
>
> I am using a HashPartitioner on my key for a map reducer job. I am wondering
> how sometimes 2 reducers end up getting the same key ? I have the hashCode
> method defined for my key.
>
> Also, I have speculative execution turned off for my jobs..
>
> Would appreciate any help.
>
> Thanks,
> Deepika
>
--
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com