Thanks Uma.

So when HDFS writes data the it distributes the blocks only according to
the percentage usage (and not actual utilization)?

I think that running balancer between every job is overkill. I prefer to
format the existing nodes and give them 3TB.

Lior


On Wed, Nov 30, 2011 at 3:02 PM, Uma Maheswara Rao G
<mahesw...@huawei.com>wrote:

>  Default blockplacement policy will check the remaining space like
> following.
>
> If the remaining space in that node is greater than
> blksize*MIN_BLKS_FOR_WRITE (default 5) , then it will treat that node as
> good.
>
>
>
> I think the option may be is to run the balancer to move the blocks based
> on DN utilization, in-between after some jobs completed... I am not sure
> this can work with your requirements.
>
>
>
> Regards,
>
> Uma
>  ------------------------------
> *From:* Lior Schachter [lior...@gmail.com]
> *Sent:* Wednesday, November 30, 2011 5:55 PM
> *To:* hdfs-user@hadoop.apache.org
> *Subject:* Load balancing HDFS
>
>     Hi all,
>> We currently have a 10 nodes cluster with 6TB per machine.
>> We are buying few more nodes and considering to have only 3TB per machine.
>>
>> By default HDFS assigns blocks according to used capacity, percentage
>> wise.
>> This means that old nodes will contain more data.
>> We prefer that the nodes (6TB, 3TB) will be balanced by actual used space
>> so M/R jobs will work better.
>> We don't expect to exceed the 3TB limit (buy more machines).
>>
>> Thanks,****
>>
>> Lior
>>
>>
>

Reply via email to