kinda clunky but you could do this via shell:

for $FILE in $LIST_OF_FILES ; do
  hadoop fs -copyFromLocal $FILE $DEST_PATH &
done

If doing this via the Java API, then, yes you will have to use multiple
threads.

On Wed, May 18, 2011 at 1:04 AM, Mapred Learn <[email protected]>wrote:

> Thanks harsh !
> That means basically both APIs as well as hadoop client commands allow only
> serial writes.
> I was wondering what could be other ways to write data in parallel to HDFS
> other than using multiple parallel threads.
>
> Thanks,
> JJ
>
> Sent from my iPhone
>
> On May 17, 2011, at 10:59 PM, Harsh J <[email protected]> wrote:
>
> > Hello,
> >
> > Adding to Joey's response, copyFromLocal's current implementation is
> serial
> > given a list of files.
> >
> > On Wed, May 18, 2011 at 9:57 AM, Mapred Learn <[email protected]>
> > wrote:
> >> Thanks Joey !
> >> I will try to find out abt copyFromLocal. Looks like Hadoop Apis write
> > serially as you pointed out.
> >>
> >> Thanks,
> >> -JJ
> >>
> >> On May 17, 2011, at 8:32 PM, Joey Echeverria <[email protected]> wrote:
> >>
> >>> The sequence file writer definitely does it serially as you can only
> >>> ever write to the end of a file in Hadoop.
> >>>
> >>> Doing copyFromLocal could write multiple files in parallel (I'm not
> >>> sure if it does or not), but a single file would be written serially.
> >>>
> >>> -Joey
> >>>
> >>> On Tue, May 17, 2011 at 5:44 PM, Mapred Learn <[email protected]>
> > wrote:
> >>>> Hi,
> >>>> My question is when I run a command from hdfs client, for eg. hadoop
> fs
> >>>> -copyFromLocal or create a sequence file writer in java code and
> append
> >>>> key/values to it through Hadoop APIs, does it internally
> transfer/write
> > data
> >>>> to HDFS serially or in parallel ?
> >>>>
> >>>> Thanks in advance,
> >>>> -JJ
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Joseph Echeverria
> >>> Cloudera, Inc.
> >>> 443.305.9434
> >>
> >
> > --
> > Harsh J
>

Reply via email to