The time seemed to be around double the time taken to scp. Didn't realize
it could be due to replication.

Regd dfs being faster than scp, the statement came more out of expectation
(or wish list) rather than anything else. Since scp is the most elementary
way of copying files, was thinking if the network topology of the cluster
can be exploited in any way. The only intuition I had was there may be
some approaches faster than scp, if any concepts from P2P file sharing are
used here. Though I didn't fully explore P2P, I thought there may be some
new developments in that area which may be useful here? After napster's
centralized way of copying, I think there were quite a bit of
improvements? Just thinking loud.

- Prasad.

>
> How much slower is 'dfs -put' any way? How large is the file you are
> copying?
>
>  >  but shouldn't that
>  > be atleast as fast as copying data to namenode from a single machine,
>
> It would be "at most" as fast as scp assuming you are not cpu bound. Why
> would you think dfs be faster even if it copying to a single replica?
>
> Raghu.
>
> Dennis Kubes wrote:
>> While an scp will copy data to the namenode machine, it does *not* store
>>  the data in dfs, it simply copies the data to namenode machine.   This
>> is the same as copying data to any other machine.  The data isn't in DFS
>> and is not accessible from DFS.  If the box running the namenode fails
>> you lose your data.
>>
>> The reason put is slower is that the data is actually being stored into
>> the DFS on multiple machines in block format.  It is then accessible
>> from programs accessing the DFS such as MR jobs.
>>
>> Dennis
>>
>> Prasad Pingali wrote:
>>> Hello,
>>>    I observe that scp of data to the namenode is faster than actually
>>> putting into dfs (all nodes coming from same switch and have same
>>> ethernet cards, homogenous nodes)? I understand that "dfs -put" breaks
>>> the data into blocks and then copies to datanodes, but shouldn't that
>>> be atleast as fast as copying data to namenode from a single machine,
>>> if not faster?
>>>
>>> thanks and regards,
>>> Prasad Pingali,
>>> IIIT Hyderabad.
>>>
>>>
>
>
>



Reply via email to