Hi Bobby,

Yeah, that won't be a big deal in this case.  It will create about 40
files, each about 60MB each.  This job is kind of an odd one that
won't be run very often.

Thanks,

Tom

On Mon, Jul 25, 2011 at 1:34 PM, Robert Evans <[email protected]> wrote:
> Tom,
>
> I also forgot to mention that if you are writing to lots of little files it 
> could cause issues too.  HDFS is designed to handle relatively few BIG files. 
>  There is some work to improve this, but it is still a ways off.  So it is 
> likely going to be very slow and put a big load on the namenode if you are 
> going to create lot of small files using this method.
>
> --Bobby
>
>
> On 7/25/11 3:30 PM, "Robert Evans" <[email protected]> wrote:
>
> Tom,
>
> That assumes that you will never write to the same file from two different 
> mappers or processes.  HDFS currently does not support writing to a single 
> file from multiple processes.
>
> --Bobby
>
> On 7/25/11 3:25 PM, "Tom Melendez" <[email protected]> wrote:
>
> Hi Folks,
>
> Just doing a sanity check here.
>
> I have a map-only job, which produces a filename for a key and data as
> a value.  I want to write the value (data) into the key (filename) in
> the path specified when I run the job.
>
> The value (data) doesn't need any formatting, I can just write it to
> HDFS without modification.
>
> So, looking at this link (the Output Formats section):
>
> http://developer.yahoo.com/hadoop/tutorial/module5.html
>
> Looks like I want to:
> - create a new output format
> - override write, tell it not to call writekey as I don't want that written
> - new getRecordWriter method that use the key as the filename and
> calls my outputformat
>
> Sound reasonable?
>
> Thanks,
>
> Tom
>
> --
> ===================
> Skybox is hiring.
> http://www.skyboximaging.com/careers/jobs
>
>
>



-- 
===================
Skybox is hiring.
http://www.skyboximaging.com/careers/jobs

Reply via email to