Hi Eric- On Tue, Mar 29, 2011 at 03:20:38PM +0200, Eric wrote: > I'm interested in hearing how you get data into and out of HDFS. Are you > using tools like Flume? Are you using fuse_dfs? Are you putting files on > HDFS with "hadoop dfs -put ..."? > And how does your method scale? Can you move terrabytes of data per day? Or > are we talking gigabytes?
I'm currently migrating our ~600TB datastore to HDFS. To transfer the data, we iterate through the raw files stored on our legacy data servers and write them to HDFS using `hadoop fs -put`. So far, I've limited the number of servers participating in the migration, so we've only had on the order of 20 parallel writers. This week, I plan to increase that by at least an order of magnitude. I expect to be able to scale the migration horizontally without impacting our current production system. Then, when the transfers are complete, we can cut our protocol endpoints over without significant downtime. At least, that's the plan. ;) -- Will Maier - UW High Energy Physics cel: 608.438.6162 tel: 608.263.9692 web: http://www.hep.wisc.edu/~wcmaier/