Thanks Uma. So when HDFS writes data the it distributes the blocks only according to the percentage usage (and not actual utilization)?
I think that running balancer between every job is overkill. I prefer to format the existing nodes and give them 3TB. Lior On Wed, Nov 30, 2011 at 3:02 PM, Uma Maheswara Rao G <mahesw...@huawei.com>wrote: > Default blockplacement policy will check the remaining space like > following. > > If the remaining space in that node is greater than > blksize*MIN_BLKS_FOR_WRITE (default 5) , then it will treat that node as > good. > > > > I think the option may be is to run the balancer to move the blocks based > on DN utilization, in-between after some jobs completed... I am not sure > this can work with your requirements. > > > > Regards, > > Uma > ------------------------------ > *From:* Lior Schachter [lior...@gmail.com] > *Sent:* Wednesday, November 30, 2011 5:55 PM > *To:* hdfs-user@hadoop.apache.org > *Subject:* Load balancing HDFS > > Hi all, >> We currently have a 10 nodes cluster with 6TB per machine. >> We are buying few more nodes and considering to have only 3TB per machine. >> >> By default HDFS assigns blocks according to used capacity, percentage >> wise. >> This means that old nodes will contain more data. >> We prefer that the nodes (6TB, 3TB) will be balanced by actual used space >> so M/R jobs will work better. >> We don't expect to exceed the 3TB limit (buy more machines). >> >> Thanks,**** >> >> Lior >> >> >