Tom, What I meant to say was that doing this is well supported with existing API/libraries itself:
- The class MultipleOutputs supports providing a filename for an output. See MultipleOutputs.addNamedOutput usage [1]. - The type 'NullWritable' is a special writable that doesn't do anything. So if its configured into the above filename addition as a key-type, and you pass NullWritable.get() as the key in every write operation, you will end up just writing the value part of (key, value). - This way you do not have to write a custom OutputFormat for your use-case. [1] - http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html (Also available for the new API, depending on which version/distribution of Hadoop you are on) On Tue, Jul 26, 2011 at 3:36 AM, Tom Melendez <[email protected]> wrote: > Hi Harsh, > > Thanks for the response. Unfortunately, I'm not following your response. :-) > > Could you elaborate a bit? > > Thanks, > > Tom > > On Mon, Jul 25, 2011 at 2:10 PM, Harsh J <[email protected]> wrote: >> You can use MultipleOutputs (or MultiTextOutputFormat for direct >> key-file mapping, but I'd still prefer the stable MultipleOutputs). >> Your sinking Key can be of NullWritable type, and you can keep passing >> an instance of NullWritable.get() to it in every cycle. This would >> write just the value, while the filenames are added/sourced from the >> key inside the mapper code. >> >> This, if you are not comfortable writing your own code and maintaining >> it, I s'pose. Your approach is correct as well, if the question was >> specifically that. >> >> On Tue, Jul 26, 2011 at 1:55 AM, Tom Melendez <[email protected]> wrote: >>> Hi Folks, >>> >>> Just doing a sanity check here. >>> >>> I have a map-only job, which produces a filename for a key and data as >>> a value. I want to write the value (data) into the key (filename) in >>> the path specified when I run the job. >>> >>> The value (data) doesn't need any formatting, I can just write it to >>> HDFS without modification. >>> >>> So, looking at this link (the Output Formats section): >>> >>> http://developer.yahoo.com/hadoop/tutorial/module5.html >>> >>> Looks like I want to: >>> - create a new output format >>> - override write, tell it not to call writekey as I don't want that written >>> - new getRecordWriter method that use the key as the filename and >>> calls my outputformat >>> >>> Sound reasonable? >>> >>> Thanks, >>> >>> Tom >>> >>> -- >>> =================== >>> Skybox is hiring. >>> http://www.skyboximaging.com/careers/jobs >>> >> >> >> >> -- >> Harsh J >> > > > > -- > =================== > Skybox is hiring. > http://www.skyboximaging.com/careers/jobs > -- Harsh J
