[
https://issues.apache.org/jira/browse/HBASE-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689484#action_12689484
]
Samuel Guo commented on HBASE-57:
---------------------------------
Hi hbasers,
I'd like to work on this issue as my GSOC project "Exploit locality when
assigning regions in HBase".
After talking with Stack in emails, I have got some initial thoughts on this
issue. I'd like to share them with you and welcome for your comments.
Before designing a suitable mechanism to using the region's locality, we need
to know how blocks are allocated in a hbase cluster and the data-blocks
distribution of a specified region over its lifetime in hbase. so that we can
find out how the region locality effect the performance. It is difficult to
capture all these information in a real cluster. An alternative way to study
the locality phenomeon may be simulating the data-block placement procedure in
HDFS(local node, local rack, and remote rack) and the regions-allocation
mechanism of a hbase cluster in a single machine. And a approximate detail
report from simulation can be used for analysis and development.
Although I haven't got any detail information about the locality phenomeon, I
try to give an initial proposal first. The initial proposal is to schedule the
regions to the datanodes(regionservers) that contains most data-blocks of the
specified region. The most challenge thing is to know the data-blocks layout(we
can query namenode in HDFS to get these information) of a region in master. And
an initial method is to record these layout information of regions in .META.
table.
Some background threads may be run on the master scanning the .META. table to
pick up the candidate nodes for region-allocation(these nodes may be sorted by
the number of blocks they contain). The detail allocation mechanism will be
discussed below.
(1) A blank region created when the table is first created. As we haven't got
any data in it, we can allocate it according to the current loads of the
cluster. It is an easy way. And after the region grows up and were flushed back
to HDFS, we get the blocks' locations information and records them to .META.
table for next-allocation.
(2) A region is created by splitting its parent region. We can use
parent-region's blocks' location information to make an allocation decision.
And after we finish the splitting procedure, we can simply copy the
parent-region's blocks' location information to each sub-region's .META. table
information.
(3) A region is re-allocated after the regionserver crash. The logfiles' blocks
information will be considered into allocation so that we may accelerate the
recovery of a failed-region.
> [hbase] Master should allocate regions to regionservers based upon data
> locality and rack awareness
> ---------------------------------------------------------------------------------------------------
>
> Key: HBASE-57
> URL: https://issues.apache.org/jira/browse/HBASE-57
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: master
> Affects Versions: 0.2.0
> Reporter: stack
> Fix For: 0.20.0
>
>
> Currently, regions are assigned regionservers based off a basic loading
> attribute. A factor to include in the assignment calcuation is the location
> of the region in hdfs; i.e. servers hosting region replicas. If the cluster
> is such that regionservers are being run on the same nodes as those running
> hdfs, then ideally the regionserver for a particular region should be running
> on the same server as hosts a region replica.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.