Re: Are hadoop fs commands serial or parallel

Brian Bockelman Fri, 20 May 2011 08:11:56 -0700

On May 20, 2011, at 6:10 AM, Dieter Plaetinck wrote:

> What do you mean clunky?
> IMHO this is a quite elegant, simple, working solution.


Try giving it to a user; watch them feed it a list of 10,000 files; watch the 
machine swap to death and the disks uselessly thrash.

> Sure this spawns multiple processes, but it beats any
> api-overcomplications, imho.
> 

Simple doesn't imply scalable, unfortunately.

Brian

> Dieter
> 
> 
> On Wed, 18 May 2011 11:39:36 -0500
> Patrick Angeles <[email protected]> wrote:
> 
>> kinda clunky but you could do this via shell:
>> 
>> for $FILE in $LIST_OF_FILES ; do
>>  hadoop fs -copyFromLocal $FILE $DEST_PATH &
>> done
>> 
>> If doing this via the Java API, then, yes you will have to use
>> multiple threads.
>> 
>> On Wed, May 18, 2011 at 1:04 AM, Mapred Learn
>> <[email protected]>wrote:
>> 
>>> Thanks harsh !
>>> That means basically both APIs as well as hadoop client commands
>>> allow only serial writes.
>>> I was wondering what could be other ways to write data in parallel
>>> to HDFS other than using multiple parallel threads.
>>> 
>>> Thanks,
>>> JJ
>>> 
>>> Sent from my iPhone
>>> 
>>> On May 17, 2011, at 10:59 PM, Harsh J <[email protected]> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> Adding to Joey's response, copyFromLocal's current implementation
>>>> is
>>> serial
>>>> given a list of files.
>>>> 
>>>> On Wed, May 18, 2011 at 9:57 AM, Mapred Learn
>>>> <[email protected]> wrote:
>>>>> Thanks Joey !
>>>>> I will try to find out abt copyFromLocal. Looks like Hadoop Apis
>>>>> write
>>>> serially as you pointed out.
>>>>> 
>>>>> Thanks,
>>>>> -JJ
>>>>> 
>>>>> On May 17, 2011, at 8:32 PM, Joey Echeverria <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> The sequence file writer definitely does it serially as you can
>>>>>> only ever write to the end of a file in Hadoop.
>>>>>> 
>>>>>> Doing copyFromLocal could write multiple files in parallel (I'm
>>>>>> not sure if it does or not), but a single file would be written
>>>>>> serially.
>>>>>> 
>>>>>> -Joey
>>>>>> 
>>>>>> On Tue, May 17, 2011 at 5:44 PM, Mapred Learn
>>>>>> <[email protected]>
>>>> wrote:
>>>>>>> Hi,
>>>>>>> My question is when I run a command from hdfs client, for eg.
>>>>>>> hadoop
>>> fs
>>>>>>> -copyFromLocal or create a sequence file writer in java code
>>>>>>> and
>>> append
>>>>>>> key/values to it through Hadoop APIs, does it internally
>>> transfer/write
>>>> data
>>>>>>> to HDFS serially or in parallel ?
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> -JJ
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Joseph Echeverria
>>>>>> Cloudera, Inc.
>>>>>> 443.305.9434
>>>>> 
>>>> 
>>>> --
>>>> Harsh J
>>>

smime.p7s
Description: S/MIME cryptographic signature

Re: Are hadoop fs commands serial or parallel

Reply via email to