You could upload these logs from non-DataNodes, like NameNode or nodes outside HDFS.
Wade -----邮件原件----- 发件人: Igor Katkov [mailto:ikat...@gmail.com] 发送时间: 2009年11月28日 4:00 收件人: hdfs-user@hadoop.apache.org 主题: Even HDFS data distribution Hi, What is the usual approach/techniques to achieve even HDFS data distribution. I have a bunch of files (logs) outside of HDFS, if I copy them all to a node within HDFS and then do something like ./hadoop fs -copyFromLocal /mnt/accesslog-agregated.2009-10-04.log /logs it would write block locally first and then to some other node. If I do that 100 times, most of the data will be sitting on the host I doing these operations on. It would be nice, to pick a host a random and store the very first block there. Immediately I can see only one workaround - manually split these log files in as many sets as many HDFS nodes I have. Upload/scp them to HDFS nodes and then ./hadoop fs -copyFromLocal This surely is a lot of manual work, so I guess there must be a trick to make it happen with much less hassle. Ideas? P.S. I googled it, but did not find any relative discussions.