Re: key/value after reduce

Fernando Padilla Tue, 12 Feb 2008 13:34:26 -0800

Well.. I'm no hadoop expert, but let me brainstorm for a little bit:

Aren't there Output classes that take a key-value pair as input, thenthey get to decide how/what to actually output. That's how you candirect the output directly to HBase, etc..

You could create (and hadoop should include by default), aValueOutputEncoder, that all it does it output the values, ignoring thekey part.. Thus you get what you want.. not necessarily requiring akey/value pair output.

You could even have an outputter that took an InputStream as the Valuepart.. so that it could stream the output..?? possibly?


How far off is this idea?

There is also nothing holding you back from having your Reducer outputdirectly to another data/store system. Then "output" of the reducer jobwould be empty, or for debug maybe the content-length of what it put ina different file.. :)

But keep in mind, I think the BIG idea behind Hadoop is divide andconquer. That means arbitrarily cut up input, transform it once, sort,transform it once more, output. But the idea is that this shouldhopefully support N different output files. I am guessing the key/valuepair arrangement gives those output files context and meaning, or youwouldn't be able to conceptually put them back together into a coherentcollection of data.

I just remembered, you can force it to only use 1 Reduce job, thus onlyone output file, but that won't scale perfectly.. :) But for yourpurposes, you could have M map jobs, 1 Reduce job, and use aValueOutputEncoder that ignores the key part and only spits out a binaryfile.. :)









Yuri Pradkin wrote:

But OTOH, if I wanted my reducer to write binary output, I'd bescrewed, especially so in the streaming world (where I'd like to stayfor the moment).
Actually, I don't think I understand your point: if the reducer'soutput is in a key/value format, you still can run another map over itor another reduce, can't you? If the output isn't, you can't; it's upto the user who coded up the Reducer. What am I missing?
Thanks,

  -Yuri

On Tue 12 2008, Miles Osborne wrote:
You may well have another Map operation operate over the Reducer
output, in which case you'd want key-value pairs.

Miles

On 12/02/2008, Yuri Pradkin <[EMAIL PROTECTED]> wrote:
Hi,

I'm relatively new to Hadoop and I have what I hope is a simple
question:

I don't understand why the key/value assumption is preserved AFTER
the reduce operation, in other words why the output of a reducer
is expected as <key,value> instead of arbitrary, possibly binary
bytes? Why can't OutputCollector just give those raw bytes to the
RecordWriter and have it make sense of them as it pleases, or just
dump them to a file?

This seems like an unnecessary restriction to me, at least at the
first glance.

Thanks,

  -Yuri

Re: key/value after reduce

Reply via email to