Yes. I try to always upload data from a machine that is not part of the cluster for exactly that reason.
I still find that I need to rebalance due to a strange problem in placement. My datanodes have 10x different sized HDFS disks and I suspect that the upload is picking datanodes uniformly rather than according to available space. Oddly enough, my rebalancing code works well. All it does is iterate through all files of interest, increasing the replication count for 30 seconds and then decreasing it again (obviously this has to thread to manipulate more than 2 files per minute). The replication code seems to select a home for new blocks more correctly than the original placement. On 12/20/07 10:16 AM, "Jeff Eastman" <[EMAIL PROTECTED]> wrote: > Noting your use of the word "attempts", can I conclude that at some point it > might be impossible to upload blocks from a local file to the DFS on the same > node and at that point the blocks would all be loaded elsewhere?