Thanks Brian for the good advice.
Slightly off topic from original post: there will be occasions where it
is necessary or better to copy different portions of a file in parallel
(distcp can benefit a lot). There is a proposal to let HDFS 'stitch'
multiple files into one: something like
NameNode.stitchFiles(Path to, Path[] files)
This way a very large file can be copied more efficiently (with a
map/red job, for e.g). Another use case is for high latency and high
bandwidth connections (like coast-to-coast). High latency can be some
what worked around by using large buffers for tcp connections, but
usually users don't have that control. It is just simpler to use
multiple connections.
This will obviously be HDFS only interface (i.e. not a FileSystem
method) at least initially.
Raghu.
Brian Bockelman wrote:
Hey Sugandha,
Transfer rates depend on the quality/quantity of your hardware and the
quality of your client disk that is generating the data. I usually say
that you should expect near-hardware-bottleneck speeds for an otherwise
idle cluster.
There should be no "make it fast" required (though you should reviewi
the logs for errors if it's going slow). I would expect a 5GB file to
take around 3-5 minutes to write on our cluster, but it's a well-tuned
and operational cluster.
As Todd (I think) mentioned before, we can't help any when you say "I
want to make it faster". You need to provide diagnostic information -
logs, Ganglia plots, stack traces, something - that folks can look at.
Brian
On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:
But if I want to make it fast, then??? I want to place the data in
HDFS and
reoplicate it in fraction of seconds. Can that be possible. and How?
On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena <kartik....@gmail.com>
wrote:
I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb
file.
Secura
On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
<sugandha....@gmail.com>wrote:It
Hello!
If I try to transfer a 5GB VDI file from a remote host(not a part of
hadoop
cluster) into HDFS, and get it back, how much time is it supposed to
take?
No map-reduce involved. Simply Writing files in and out from HDFS
through
a
simple code of java (usage of API's).
--
Regards!
Sugandha
--
Regards!
Sugandha