kinda clunky but you could do this via shell: for $FILE in $LIST_OF_FILES ; do hadoop fs -copyFromLocal $FILE $DEST_PATH & done
If doing this via the Java API, then, yes you will have to use multiple threads. On Wed, May 18, 2011 at 1:04 AM, Mapred Learn <[email protected]>wrote: > Thanks harsh ! > That means basically both APIs as well as hadoop client commands allow only > serial writes. > I was wondering what could be other ways to write data in parallel to HDFS > other than using multiple parallel threads. > > Thanks, > JJ > > Sent from my iPhone > > On May 17, 2011, at 10:59 PM, Harsh J <[email protected]> wrote: > > > Hello, > > > > Adding to Joey's response, copyFromLocal's current implementation is > serial > > given a list of files. > > > > On Wed, May 18, 2011 at 9:57 AM, Mapred Learn <[email protected]> > > wrote: > >> Thanks Joey ! > >> I will try to find out abt copyFromLocal. Looks like Hadoop Apis write > > serially as you pointed out. > >> > >> Thanks, > >> -JJ > >> > >> On May 17, 2011, at 8:32 PM, Joey Echeverria <[email protected]> wrote: > >> > >>> The sequence file writer definitely does it serially as you can only > >>> ever write to the end of a file in Hadoop. > >>> > >>> Doing copyFromLocal could write multiple files in parallel (I'm not > >>> sure if it does or not), but a single file would be written serially. > >>> > >>> -Joey > >>> > >>> On Tue, May 17, 2011 at 5:44 PM, Mapred Learn <[email protected]> > > wrote: > >>>> Hi, > >>>> My question is when I run a command from hdfs client, for eg. hadoop > fs > >>>> -copyFromLocal or create a sequence file writer in java code and > append > >>>> key/values to it through Hadoop APIs, does it internally > transfer/write > > data > >>>> to HDFS serially or in parallel ? > >>>> > >>>> Thanks in advance, > >>>> -JJ > >>>> > >>> > >>> > >>> > >>> -- > >>> Joseph Echeverria > >>> Cloudera, Inc. > >>> 443.305.9434 > >> > > > > -- > > Harsh J >
