You can use MultipleOutputs (or MultiTextOutputFormat for direct
key-file mapping, but I'd still prefer the stable MultipleOutputs).
Your sinking Key can be of NullWritable type, and you can keep passing
an instance of NullWritable.get() to it in every cycle. This would
write just the value, while the filenames are added/sourced from the
key inside the mapper code.

This, if you are not comfortable writing your own code and maintaining
it, I s'pose. Your approach is correct as well, if the question was
specifically that.

On Tue, Jul 26, 2011 at 1:55 AM, Tom Melendez <[email protected]> wrote:
> Hi Folks,
>
> Just doing a sanity check here.
>
> I have a map-only job, which produces a filename for a key and data as
> a value.  I want to write the value (data) into the key (filename) in
> the path specified when I run the job.
>
> The value (data) doesn't need any formatting, I can just write it to
> HDFS without modification.
>
> So, looking at this link (the Output Formats section):
>
> http://developer.yahoo.com/hadoop/tutorial/module5.html
>
> Looks like I want to:
> - create a new output format
> - override write, tell it not to call writekey as I don't want that written
> - new getRecordWriter method that use the key as the filename and
> calls my outputformat
>
> Sound reasonable?
>
> Thanks,
>
> Tom
>
> --
> ===================
> Skybox is hiring.
> http://www.skyboximaging.com/careers/jobs
>



-- 
Harsh J

Reply via email to