re: Even HDFS data distribution

肖之 Sun, 29 Nov 2009 18:25:18 -0800

You could upload these logs from non-DataNodes, like NameNode or nodes
outside HDFS.



Wade



-----邮件原件-----
发件人: Igor Katkov [mailto:ikat...@gmail.com] 
发送时间: 2009年11月28日 4:00
收件人: hdfs-user@hadoop.apache.org
主题: Even HDFS data distribution

Hi,

What is the usual approach/techniques to achieve even HDFS data
distribution.
I have a bunch of files (logs) outside of HDFS, if I copy them all to
a node within HDFS and then do something like

./hadoop fs -copyFromLocal /mnt/accesslog-agregated.2009-10-04.log /logs

it would write block locally first and then to some other node.
If I do that 100 times, most of the data will be sitting on the host I
doing these operations on.

It would be nice, to pick a host a random and store the very first block
there.
Immediately I can see only one workaround - manually split these log
files in as many sets as many HDFS nodes I have. Upload/scp them to
HDFS nodes and then ./hadoop fs -copyFromLocal
This surely is a lot of manual work, so I guess there must be a trick
to make it happen with much less hassle.

Ideas?

P.S. I googled it, but did not find any relative discussions.

re: Even HDFS data distribution

Reply via email to