Hadoop1-11 were indeed added later as a batch, but the entire cluster was restarted when these machines were added. The imbalance occurred during a copy from a machine outside the cluster.
Note that since then, I have scrubbed all content from the cluster and restarted it. In filling this incarnation of the cluster, the smaller machines' disks were entirely filled up. The 10GB limit that I had set for free space was not respected. I changed to limit based on 90% fill and that seems to work. Scrubbing and filling has resulted in the larger nodes getting additional data after the smaller ones fill. The disk does fill a good bit more than 90% which surprised me. The original imbalance has not reappeared and I have been unable to determine what might have caused it. On 9/24/07 5:41 PM, "Hairong Kuang" <[EMAIL PROTECTED]> wrote: > Hi Ted, > > This is interesting. I assume that hadoop1-hadoop11 are newly added nodes. > Could you please provide me more information about your hdfs cluster? What's > the topology of the cluster, i.e. how many racks it has and which machines > belong to which rack? Were they added to the cluster at the same time or > hadoop 10&11 were added later? > > Hairong > > -----Original Message----- >> From: Ted Dunning <[EMAIL PROTECTED]> >> Reply-To: <[email protected]> >> Date: Wed, 19 Sep 2007 19:46:14 -0700 >> To: <[email protected]> >> Conversation: Statistically bad distribution of blocks >> Subject: Statistically bad distribution of blocks >> >> >> I just added 10 datanodes to a small cluster and turned up the >> replication on many of the files to balance the storage out a bit. >> >> I expected to see a uniform-ish distribution of blocks on the new nodes. >> This is what I got instead: >> >> Node Last Contact State Size (GB) Used (%) Blocks >> hadoop1 0 In Service 42.68 72.36 585 >> hadoop10 1 In Service 42.68 50.30 354 >> hadoop11 2 In Service 42.68 48.02 340 >> hadoop2 2 In Service 42.68 73.01 597 >> hadoop3 2 In Service 42.68 72.68 614 >> hadoop6 0 In Service 42.68 72.87 578 >> hadoop7 0 In Service 42.68 72.38 600 >> hadoop8 2 In Service 42.68 72.30 593 >> hadoop9 2 In Service 42.68 72.70 637 >> metricsapp1 0 In Service 257.98 90.52 4134 >> metricsapp2 0 In Service 257.98 40.23 2338 >> metricsapp3 2 In Service 247.20 39.41 2889 >> metricsapp4 2 In Service 257.98 98.44 5096 >> >> The right-most column is what we are interested in here. Note how >> hadoop10 and hadoop11 have significantly fewer blocks than the others. >> Statistically we should expect that the counts should vary less than >> about >> >> 2 * sqrt(600) = 50 >> >> Indeed, most of them do. But those two do not. >> >> Is there some hidden significance in the names of nodes? >> >> > >
