Thanks Brian for the good advice.

Slightly off topic from original post: there will be occasions where it is necessary or better to copy different portions of a file in parallel (distcp can benefit a lot). There is a proposal to let HDFS 'stitch' multiple files into one: something like

NameNode.stitchFiles(Path to, Path[] files)

This way a very large file can be copied more efficiently (with a map/red job, for e.g). Another use case is for high latency and high bandwidth connections (like coast-to-coast). High latency can be some what worked around by using large buffers for tcp connections, but usually users don't have that control. It is just simpler to use multiple connections.

This will obviously be HDFS only interface (i.e. not a FileSystem method) at least initially.

Raghu.

Brian Bockelman wrote:
Hey Sugandha,

Transfer rates depend on the quality/quantity of your hardware and the quality of your client disk that is generating the data. I usually say that you should expect near-hardware-bottleneck speeds for an otherwise idle cluster.

There should be no "make it fast" required (though you should reviewi the logs for errors if it's going slow). I would expect a 5GB file to take around 3-5 minutes to write on our cluster, but it's a well-tuned and operational cluster.

As Todd (I think) mentioned before, we can't help any when you say "I want to make it faster". You need to provide diagnostic information - logs, Ganglia plots, stack traces, something - that folks can look at.

Brian

On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:

But if I want to make it fast, then??? I want to place the data in HDFS and
reoplicate it in fraction of seconds. Can that be possible. and How?

On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena <kartik....@gmail.com> wrote:

I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb
file.
Secura

On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
<sugandha....@gmail.com>wrote:It

Hello!

If I try to transfer a 5GB VDI file from a remote host(not a part of
hadoop
cluster) into HDFS, and get it back, how much time is it supposed to
take?

No map-reduce involved. Simply Writing files in and out from HDFS through
a
simple code of java (usage of API's).

--
Regards!
Sugandha





--
Regards!
Sugandha


Reply via email to