Thanks Dhruba, That makes sense. The data was already on the master node and I did not consider that I could upload from other nodes too. The distribution on the slave nodes is uniform and your response explains why the one other bigger box did not get a larger number of blocks. Noting your use of the word "attempts", can I conclude that at some point it might be impossible to upload blocks from a local file to the DFS on the same node and at that point the blocks would all be loaded elsewhere?
Jeff -----Original Message----- From: dhruba Borthakur [mailto:[EMAIL PROTECTED] Sent: Thursday, December 20, 2007 9:38 AM To: hadoop-user@lucene.apache.org Subject: RE: DFS Block Allocation Hi Jeff, Did you run the file-upload command on the master node itself? The DFS client attempts to store one replica of the data on the node on which the DFSClient is running. To get a uniform distribution, it would be good if you upload your data from multiple nodes in your cluster. Thanks, dhruba -----Original Message----- From: Jeff Eastman [mailto:[EMAIL PROTECTED] Sent: Thursday, December 20, 2007 7:15 AM To: hadoop-user@lucene.apache.org Subject: DFS Block Allocation I've brought up a small cluster and uploaded some large files. The master node is cu027 and it seems to be getting an unfair percentage of the blocks allocated to it, especially compared to cu171 which has the same size disk. Can somebody shed some light on the reasons for this? Jeff