Looking at the DefaultLoadBalancer.balance(), the balancing is purely based on the number of regions hosted per region server and not on the resource usage. HBASE-57 suggests to use the data locality into consideration when the regions are assigned to the region server. It would be nice to consider both the resource usage of the region and the data locality into consideration, not just purely based on the number of regions in the region server as implemented currently.
The file to block mapping can be found from the HDFS NameNode, but how to find out which regions are loaded (# of requests, cpu and memory perspective) and which are not? I could not see any resource utilization in the region server pages. Also, curious if HBASE-57 makes sense, since the major compaction runs every 24 hrs and the HFiles are all local to the regions after major compaction. I think that the balancer has to be run manually in HDFS and there will be a maximum of 24 hrs window between a HDFS balancer execution and a major compaction during which data locality might be lost. I am interested in working on this JIRA, but need some help from the HBase community. Regards, Praveen On Tue, Feb 14, 2012 at 7:34 PM, Mikael Sitruk <[email protected]>wrote: > Region allocation is kept in the next restart ( > https://issues.apache.org/jira/browse/HBASE-2896 ). This is also present > in > the CDH3 code. > Nevertheless if you have a server that did not start correctly you will > have region that will move from it and locality will not remain (even after > you start the problematic node, since he will get random regions) > The best solution would be effectivly > https://issues.apache.org/jira/browse/HBASE-57 > > > Mikael.S > > On Tue, Feb 14, 2012 at 3:19 PM, Brock Noland <[email protected]> wrote: > > > Hi, > > > > On Tue, Feb 14, 2012 at 7:13 AM, Praveen Sripati > > <[email protected]> wrote: > > > Lars blog (1) mentions that data locality for the region servers is > lost > > > when HBase cluster is restarted. It's also mentioned at the end that > work > > > is going in HBase to assign regions to RS taking data locality into > > > consideration. The blog entry is 18 months old and so I would like to > > know > > > if this has been incorporated into the latest HBase release or data > > > locality is lost till a compaction is complete. > > > > JIRA is down for me, but here is the JIRA: > > > > https://issues.apache.org/jira/browse/HBASE-2896 > > > > I am pretty sure it's been included in the latest HBase release as it's > in > > CDH3. > > > > Brock > > > > -- > > Apache MRUnit - Unit testing MapReduce - > > http://incubator.apache.org/mrunit/ > > > > > > -- > Mikael.S >
