How much slower is 'dfs -put' any way? How large is the file you are
copying?
> but shouldn't that
> be atleast as fast as copying data to namenode from a single machine,
It would be "at most" as fast as scp assuming you are not cpu bound. Why
would you think dfs be faster even if it copying to a single replica?
Raghu.
Dennis Kubes wrote:
While an scp will copy data to the namenode machine, it does *not* store
the data in dfs, it simply copies the data to namenode machine. This
is the same as copying data to any other machine. The data isn't in DFS
and is not accessible from DFS. If the box running the namenode fails
you lose your data.
The reason put is slower is that the data is actually being stored into
the DFS on multiple machines in block format. It is then accessible
from programs accessing the DFS such as MR jobs.
Dennis
Prasad Pingali wrote:
Hello,
I observe that scp of data to the namenode is faster than actually
putting into dfs (all nodes coming from same switch and have same
ethernet cards, homogenous nodes)? I understand that "dfs -put" breaks
the data into blocks and then copies to datanodes, but shouldn't that
be atleast as fast as copying data to namenode from a single machine,
if not faster?
thanks and regards,
Prasad Pingali,
IIIT Hyderabad.