[jira] Commented: (HBASE-57) [hbase] Master should allocate regions to regionservers based upon data locality and rack awareness

Samuel Guo (JIRA) Thu, 26 Mar 2009 06:50:27 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689484#action_12689484
 ]


Samuel Guo commented on HBASE-57:
---------------------------------

Hi hbasers,
I'd like to work on this issue as my GSOC project "Exploit locality when 
assigning regions in HBase".

After talking with Stack in emails, I have got some initial thoughts on this 
issue. I'd like to share them with you and welcome for your comments.

Before designing a suitable mechanism to using the region's locality, we need 
to know how blocks are allocated in a hbase cluster and the data-blocks 
distribution of a specified region over its lifetime in hbase. so that we can 
find out how the region locality effect the performance. It is difficult to 
capture all these information in a real cluster. An alternative way to study 
the locality phenomeon may be simulating the data-block placement procedure in 
HDFS(local node, local rack, and remote rack) and the regions-allocation 
mechanism of a hbase cluster in a single machine. And a approximate detail 
report from simulation can be used for analysis and development.

Although I haven't got any detail information about the locality phenomeon, I 
try to give an initial proposal first.  The initial proposal is to schedule the 
regions to the datanodes(regionservers) that contains most data-blocks of the 
specified region. The most challenge thing is to know the data-blocks layout(we 
can query namenode in HDFS to get these information) of a region in master. And 
an initial method is to record these layout information of regions in .META. 
table.
Some background threads may be run on the master scanning the .META. table to 
pick up the candidate nodes for region-allocation(these nodes may be sorted by 
the number of blocks they contain). The detail allocation mechanism will be 
discussed below.
(1) A blank region created when the table is first created. As we haven't got 
any data in it, we can allocate it according to the current loads of the 
cluster. It is an easy way. And after the region grows up and were flushed back 
to HDFS, we get the blocks' locations information and records them to .META. 
table for next-allocation.
(2) A region is created by splitting its parent region. We can use 
parent-region's blocks' location information to make an allocation decision. 
And after we finish the splitting procedure, we can simply copy the 
parent-region's blocks' location information to each sub-region's .META. table 
information. 
(3) A region is re-allocated after the regionserver crash. The logfiles' blocks 
information will be considered into allocation so that we may accelerate the 
recovery of a failed-region.


> [hbase] Master should allocate regions to regionservers based upon data 
> locality and rack awareness
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-57
>                 URL: https://issues.apache.org/jira/browse/HBASE-57
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.2.0
>            Reporter: stack
>             Fix For: 0.20.0
>
>
> Currently, regions are assigned regionservers based off a basic loading 
> attribute.  A factor to include in the assignment calcuation is the location 
> of the region in hdfs; i.e. servers hosting region replicas.  If the cluster 
> is such that regionservers are being run on the same nodes as those running 
> hdfs, then ideally the regionserver for a particular region should be running 
> on the same server as hosts a region replica.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-57) [hbase] Master should allocate regions to regionservers based upon data locality and rack awareness

Reply via email to