[ 
https://issues.apache.org/jira/browse/HBASE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4191:
------------------------------

    Description: 
Previously, HBASE-4114 implements the metrics for HFile HDFS block locality, 
which provides the HFile level locality information.
But in order to work with load balancer and region assignment, we need the 
region level locality information.

Let's define the region locality information first, which is almost the same as 
HFile locality index.

HRegion locality index (HRegion A, RegionServer B) = 
(Total number of HDFS blocks that can be retrieved locally by the RegionServer 
B for the HRegion A) / ( Total number of the HDFS blocks for the Region A)
So the HRegion locality index tells us that how much locality we can get if the 
HMaster assign the HRegion A to the RegionServer B.

So there will be 2 steps involved to assign regions based on the locality.
1) During the cluster start up time, the master will scan the hdfs to calculate 
the "HRegion locality index" for each pair of HRegion and Region Server. It is 
pretty expensive to scan the dfs. So we only needs to do this once during the 
start up time.

2) During the cluster run time, each region server will update the "HRegion 
locality index" as metrics periodically as HBASE-4114 did. The Region Server 
can expose them to the Master through ZK, meta table, or just RPC messages. 

Based on the "HRegion locality index", the assignment manager in the master 
would have a global knowledge about the region locality distribution. Imaging 
the "HRegion locality index" as the capacity between the region server set and 
region set, the assignment manager could the run the MAXIMUM FLOW solver to 
reach the global optimization.

In addition, the HBASE-4491 (Locality Checker) is the tool, which is based on 
the same metrics, to proactively to scan dfs to calculate the global locality 
information in the cluster. It will help us to verify data locality information 
during the run time.


  was:
HBASE-4114 implemented getTopBlockLocations().
Load balancer should utilize this method and assign the region to be moved to 
the region server with the highest block affinity.

     Issue Type: New Feature  (was: Improvement)
        Summary: hbase load balancer needs locality awareness  (was: Utilize 
getTopBlockLocations in load balancer)
    
> hbase load balancer needs locality awareness
> --------------------------------------------
>
>                 Key: HBASE-4191
>                 URL: https://issues.apache.org/jira/browse/HBASE-4191
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Ted Yu
>            Assignee: Liyin Tang
>
> Previously, HBASE-4114 implements the metrics for HFile HDFS block locality, 
> which provides the HFile level locality information.
> But in order to work with load balancer and region assignment, we need the 
> region level locality information.
> Let's define the region locality information first, which is almost the same 
> as HFile locality index.
> HRegion locality index (HRegion A, RegionServer B) = 
> (Total number of HDFS blocks that can be retrieved locally by the 
> RegionServer B for the HRegion A) / ( Total number of the HDFS blocks for the 
> Region A)
> So the HRegion locality index tells us that how much locality we can get if 
> the HMaster assign the HRegion A to the RegionServer B.
> So there will be 2 steps involved to assign regions based on the locality.
> 1) During the cluster start up time, the master will scan the hdfs to 
> calculate the "HRegion locality index" for each pair of HRegion and Region 
> Server. It is pretty expensive to scan the dfs. So we only needs to do this 
> once during the start up time.
> 2) During the cluster run time, each region server will update the "HRegion 
> locality index" as metrics periodically as HBASE-4114 did. The Region Server 
> can expose them to the Master through ZK, meta table, or just RPC messages. 
> Based on the "HRegion locality index", the assignment manager in the master 
> would have a global knowledge about the region locality distribution. Imaging 
> the "HRegion locality index" as the capacity between the region server set 
> and region set, the assignment manager could the run the MAXIMUM FLOW solver 
> to reach the global optimization.
> In addition, the HBASE-4491 (Locality Checker) is the tool, which is based on 
> the same metrics, to proactively to scan dfs to calculate the global locality 
> information in the cluster. It will help us to verify data locality 
> information during the run time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to