Re: Some new requests about mapreduce

Doug Cutting Tue, 07 Nov 2006 09:59:37 -0800

Feng Jiang wrote:

I think what I am concerning is different with the request485. I mean,if the input of Reduce phase is :
K2, V3
K2, V2
K1, V5
K1, V3
K1, V4

in the current hadoop, the reduce output could be:
K1, (V5, V3, V4)
K2, (V3, V2)
But I hope hadoop supports job.setOutputValueComparatorClass(theClass),so that i can make values are in order, and the output could be:K1, (V3, V4, V5)K2, (V2, V3)

Yes, that is different. One can currently achieve what you're after byincluding values in keys. The only real difference between keys andvalues is that values are not used for sorting, and some optimizationsare made because of that. But if you need to sort by value as well askey, then you can use compound key that includes both, and a null value.Note that with block compression, repeated keys should not use toomuch space. Does that suffice?


Another related issue is http://issues.apache.org/jira/browse/HADOOP-475.

but I have written the GenericWritable, which is a abstract class tohelp user wrap different Writable instances with only one byte cost. TheGenericObject is a demo showing how to use GenericWritable. Both of themare attached within this email.

The attachment did not make it. Can you please attach these to a Jiraissue, as a patch file?


http://wiki.apache.org/lucene-hadoop/HowToContribute

Thanks!

Doug

Re: Some new requests about mapreduce

Reply via email to