Re: Some new requests about mapreduce

Feng Jiang Tue, 07 Nov 2006 19:39:22 -0800

Thanks. I have attached the patch:
http://issues.apache.org/jira/secure/ManageAttachments.jspa?id=12354993


Best regards,
Feng

On 11/8/06, Doug Cutting <[EMAIL PROTECTED]> wrote:

Feng Jiang wrote:
> I think what I am concerning is different with the request485. I mean,
> if the input of Reduce phase is :
>
> K2, V3
> K2, V2
> K1, V5
> K1, V3
> K1, V4
>
> in the current hadoop, the reduce output could be:
> K1, (V5, V3, V4)
> K2, (V3, V2)
>
> But I hope hadoop supports job.setOutputValueComparatorClass(theClass),
> so that i can make values are in order, and the output could be:
> K1, (V3, V4, V5)
> K2, (V2, V3)

Yes, that is different.  One can currently achieve what you're after by
including values in keys.  The only real difference between keys and
values is that values are not used for sorting, and some optimizations
are made because of that.  But if you need to sort by value as well as
key, then you can use compound key that includes both, and a null value.
   Note that with block compression, repeated keys should not use too
much space.  Does that suffice?

Another related issue is http://issues.apache.org/jira/browse/HADOOP-475.

> but I have written the GenericWritable, which is a abstract class to
> help user wrap different Writable instances with only one byte cost. The
> GenericObject is a demo showing how to use GenericWritable. Both of them
> are attached within this email.

The attachment did not make it.  Can you please attach these to a Jira
issue, as a patch file?

http://wiki.apache.org/lucene-hadoop/HowToContribute

Thanks!

Doug

Re: Some new requests about mapreduce

Reply via email to