So, that kind of makes sense but why would it not group the other values then? There are a bunch of the exact same key (only 1 primary record, so only 1 that is different per set) and it is my understanding that they would be grouped together (without the primary key) if I didn't do anything different.
-Trevor On Wed, Jun 29, 2011 at 2:07 PM, Aaron Baff <aaron.b...@telescope.tv> wrote: > You probably need to implement a custom comparator that you use as the > grouping comparator that compares the primary key, and then if they are the > same compares the int part of the key. > > --Aaron > > > ----------------------------------------------------------------------------- > From: Trevor Adams [mailto:trevorad...@gmail.com] > Sent: Wednesday, June 29, 2011 10:00 AM > To: mapreduce-user@hadoop.apache.org > Subject: Reduce method called same key twice > > So I have a custom Key which is used for a join. It contains two fields, a > boolean (is primary key) and an int (key). Hashcode only looks at the key > field, so that it gets sent to the same reducer. Compare places the pkey at > the top of the list (if sorted using compare). This works nicely, except > that the reduce method is called with Key: 1 -> a single value, Key: 1 -> > another value etc. One for each value, so instead of bucketing the values to > a key (and some of the keys are identical, in every way) it sends 1 key and > 1 value to the reducer at a time. How do I get it to bucket or why isn't it > bucketing? > > -Trevor >