Hadoop1-11 were indeed added later as a batch, but the entire cluster was
restarted when these machines were added.  The imbalance occurred during a
copy from a machine outside the cluster.

Note that since then, I have scrubbed all content from the cluster and
restarted it.  In filling this incarnation of the cluster, the smaller
machines' disks were entirely filled up.  The 10GB limit that I had set for
free space was not respected.

I changed to limit based on 90% fill and that seems to work.  Scrubbing and
filling has resulted in the larger nodes getting additional data after the
smaller ones fill.  The disk does fill a good bit more than 90% which
surprised me.

The original imbalance has not reappeared and I have been unable to
determine what might have caused it.


On 9/24/07 5:41 PM, "Hairong Kuang" <[EMAIL PROTECTED]> wrote:

> Hi Ted,
> 
> This is interesting. I assume that hadoop1-hadoop11 are newly added nodes.
> Could you please provide me more information about your hdfs cluster? What's
> the topology of the cluster, i.e. how many racks it has and which machines
> belong to which rack? Were they added to the cluster at the same time or
> hadoop 10&11 were added later?
> 
> Hairong
> 
> -----Original Message-----
>> From: Ted Dunning <[EMAIL PROTECTED]>
>> Reply-To: <[email protected]>
>> Date: Wed, 19 Sep 2007 19:46:14 -0700
>> To: <[email protected]>
>> Conversation: Statistically bad distribution of blocks
>> Subject: Statistically bad distribution of blocks
>> 
>> 
>> I just added 10 datanodes to a small cluster and turned up the
>> replication on many of the files to balance the storage out a bit.
>> 
>> I expected to see a uniform-ish distribution of blocks on the new nodes.
>> This is what I got instead:
>> 
>>  Node    Last Contact State Size (GB)      Used (%)  Blocks
>> hadoop1        0    In Service     42.68    72.36    585
>> hadoop10       1    In Service     42.68    50.30    354
>> hadoop11       2    In Service     42.68    48.02    340
>> hadoop2        2    In Service     42.68    73.01    597
>> hadoop3        2    In Service     42.68    72.68    614
>> hadoop6        0    In Service     42.68    72.87    578
>> hadoop7        0    In Service     42.68    72.38    600
>> hadoop8        2    In Service     42.68    72.30    593
>> hadoop9        2    In Service     42.68    72.70    637
>> metricsapp1    0    In Service    257.98    90.52    4134
>> metricsapp2    0    In Service    257.98    40.23    2338
>> metricsapp3    2    In Service    247.20    39.41    2889
>> metricsapp4    2    In Service    257.98    98.44    5096
>> 
>> The right-most column is what we are interested in here.  Note how
>> hadoop10 and hadoop11 have significantly fewer blocks than the others.
>> Statistically we should expect that the counts should vary less than
>> about
>> 
>>   2 * sqrt(600) = 50
>> 
>> Indeed, most of them do.  But those two do not.
>> 
>> Is there some hidden significance in the names of nodes?
>> 
>> 
> 
> 

Reply via email to