understanding the interactions between partitioners and grouping comparators

Chris Dyer Mon, 15 Oct 2007 07:11:27 -0700

Hi all--

I had a subtle bug in a MapReduce job that I was working on related to
the fact that my custom key type's hash type (used by the default
HashPartitioner) was putting elements that belonged together according
to a custom output value grouping comparator into different bins.
Currently, I've found a single hash function that will hash everything
correctly for all the comparators that I'm using without too many
collisions.  But, this will not be generally possible for all
applications, so I wanted to ask what the best practices regarding
this should be.  The only real possibility I currently see with the
API as it is (I may be mistaken) is to write a custom Partitioner.
Did I miss something?


I'd like to ask if this might be a design bug. For the default
partioner (HashPartioner), there is a dependency between the
hashCode() of the key type and the compare function that is being used
on the keys.  The problem is that it is possible to override the
compare function by specifying a custom comparator, but it is not
possible to override the hashCode function.  This basically means that
any time you are specifying a custom comparator, you need to change
your partitioner so that you can effectively override the hashCode,
albeit indirectly.  This is irritating, and if true, the API doesn't
give any indication that you should do this, ie, but providing a
setOutputValueComparatorAndPartitioner function instead of the two
separate function.

Has anyone else encountered this?

Thanks,
Chris

understanding the interactions between partitioners and grouping comparators

Reply via email to