On May 20, 2011, at 6:10 AM, Dieter Plaetinck wrote: > What do you mean clunky? > IMHO this is a quite elegant, simple, working solution.
Try giving it to a user; watch them feed it a list of 10,000 files; watch the machine swap to death and the disks uselessly thrash. > Sure this spawns multiple processes, but it beats any > api-overcomplications, imho. > Simple doesn't imply scalable, unfortunately. Brian > Dieter > > > On Wed, 18 May 2011 11:39:36 -0500 > Patrick Angeles <[email protected]> wrote: > >> kinda clunky but you could do this via shell: >> >> for $FILE in $LIST_OF_FILES ; do >> hadoop fs -copyFromLocal $FILE $DEST_PATH & >> done >> >> If doing this via the Java API, then, yes you will have to use >> multiple threads. >> >> On Wed, May 18, 2011 at 1:04 AM, Mapred Learn >> <[email protected]>wrote: >> >>> Thanks harsh ! >>> That means basically both APIs as well as hadoop client commands >>> allow only serial writes. >>> I was wondering what could be other ways to write data in parallel >>> to HDFS other than using multiple parallel threads. >>> >>> Thanks, >>> JJ >>> >>> Sent from my iPhone >>> >>> On May 17, 2011, at 10:59 PM, Harsh J <[email protected]> wrote: >>> >>>> Hello, >>>> >>>> Adding to Joey's response, copyFromLocal's current implementation >>>> is >>> serial >>>> given a list of files. >>>> >>>> On Wed, May 18, 2011 at 9:57 AM, Mapred Learn >>>> <[email protected]> wrote: >>>>> Thanks Joey ! >>>>> I will try to find out abt copyFromLocal. Looks like Hadoop Apis >>>>> write >>>> serially as you pointed out. >>>>> >>>>> Thanks, >>>>> -JJ >>>>> >>>>> On May 17, 2011, at 8:32 PM, Joey Echeverria <[email protected]> >>>>> wrote: >>>>> >>>>>> The sequence file writer definitely does it serially as you can >>>>>> only ever write to the end of a file in Hadoop. >>>>>> >>>>>> Doing copyFromLocal could write multiple files in parallel (I'm >>>>>> not sure if it does or not), but a single file would be written >>>>>> serially. >>>>>> >>>>>> -Joey >>>>>> >>>>>> On Tue, May 17, 2011 at 5:44 PM, Mapred Learn >>>>>> <[email protected]> >>>> wrote: >>>>>>> Hi, >>>>>>> My question is when I run a command from hdfs client, for eg. >>>>>>> hadoop >>> fs >>>>>>> -copyFromLocal or create a sequence file writer in java code >>>>>>> and >>> append >>>>>>> key/values to it through Hadoop APIs, does it internally >>> transfer/write >>>> data >>>>>>> to HDFS serially or in parallel ? >>>>>>> >>>>>>> Thanks in advance, >>>>>>> -JJ >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Joseph Echeverria >>>>>> Cloudera, Inc. >>>>>> 443.305.9434 >>>>> >>>> >>>> -- >>>> Harsh J >>>
smime.p7s
Description: S/MIME cryptographic signature
