Feng Jiang wrote:
I think what I am concerning is different with the request485. I mean,
if the input of Reduce phase is :
K2, V3
K2, V2
K1, V5
K1, V3
K1, V4
in the current hadoop, the reduce output could be:
K1, (V5, V3, V4)
K2, (V3, V2)
But I hope hadoop supports job.setOutputValueComparatorClass(theClass),
so that i can make values are in order, and the output could be:
K1, (V3, V4, V5)
K2, (V2, V3)
Yes, that is different. One can currently achieve what you're after by
including values in keys. The only real difference between keys and
values is that values are not used for sorting, and some optimizations
are made because of that. But if you need to sort by value as well as
key, then you can use compound key that includes both, and a null value.
Note that with block compression, repeated keys should not use too
much space. Does that suffice?
Another related issue is http://issues.apache.org/jira/browse/HADOOP-475.
but I have written the GenericWritable, which is a abstract class to
help user wrap different Writable instances with only one byte cost. The
GenericObject is a demo showing how to use GenericWritable. Both of them
are attached within this email.
The attachment did not make it. Can you please attach these to a Jira
issue, as a patch file?
http://wiki.apache.org/lucene-hadoop/HowToContribute
Thanks!
Doug