Hi all-- I had a subtle bug in a MapReduce job that I was working on related to the fact that my custom key type's hash type (used by the default HashPartitioner) was putting elements that belonged together according to a custom output value grouping comparator into different bins. Currently, I've found a single hash function that will hash everything correctly for all the comparators that I'm using without too many collisions. But, this will not be generally possible for all applications, so I wanted to ask what the best practices regarding this should be. The only real possibility I currently see with the API as it is (I may be mistaken) is to write a custom Partitioner. Did I miss something?
I'd like to ask if this might be a design bug. For the default partioner (HashPartioner), there is a dependency between the hashCode() of the key type and the compare function that is being used on the keys. The problem is that it is possible to override the compare function by specifying a custom comparator, but it is not possible to override the hashCode function. This basically means that any time you are specifying a custom comparator, you need to change your partitioner so that you can effectively override the hashCode, albeit indirectly. This is irritating, and if true, the API doesn't give any indication that you should do this, ie, but providing a setOutputValueComparatorAndPartitioner function instead of the two separate function. Has anyone else encountered this? Thanks, Chris
