Unfortunately, my team is on 0.15 :(. We are looking to upgrade to 0.18 as soon as we upgrade our hardware (long story). >From comparing the 0.15 and 0.19 mapreduce tutorials, and looking at the 4545 patch, I don't see anything that seems majorly different about the MapReduce API? - There's a Partitioner that's used, but that seems optional? - I see that 0.19 still provides setOutputValueGroupingComparator; is the setGroupingComparatorClass in the patch from the 0.20 API?
I have an associated question -- is it possible to use this GroupingComparator technique to perform essentially a one-to-many mapping? Let's say I have records like so: id_1 - metadata id_2 - metadata id_1 A numbers id_2 B numbers id_1 C numbers Would it be possible for a key,value pair for <"id_1, -", metadata> to map to both the groups for the keys "id_1, A" and "id_1, C" ? The comparator seems easy to achieve; but I don't see multiple copies of a record being sent to multiple groups. I know it's a bit unusual, but it would be useful for us to have this kind of wildcard behavior. Meng On Mon, Jan 5, 2009 at 6:58 PM, Owen O'Malley <[email protected]> wrote: > This is exactly what the setOutputValueGroupingComparator is for. Take a > look at HADOOP-4545, for an example using the secondary sort. If you are > using trunk or 0.20, look at > src/examples/org/apache/hadoop/examples/SecondarySort.java. The checked in > example uses the new map/reduce api that was introduced in 0.20. > > -- Owen >
