[
https://issues.apache.org/jira/browse/HBASE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134803#comment-13134803
]
Mikhail Bautin commented on HBASE-4191:
---------------------------------------
@Ted: could you please elaborate on how you express the region assignment
problem as a Max Flow problem? If we define the "cost" of assigning a region to
a server based on locality, and define a constraint of "load balancedness" to
be such that each regionserver is assigned no more than approximately
ceil(numRegions / numServers) + C regions for some small value of C, then I can
see how the problem becomes a min-cost max flow
(http://en.wikipedia.org/wiki/Minimum_cost_flow_problem). However, I don't see
how we could reduce the assignment problem to the max-flow problem directly
(http://en.wikipedia.org/wiki/Maximum_flow_problem).
> hbase load balancer needs locality awareness
> --------------------------------------------
>
> Key: HBASE-4191
> URL: https://issues.apache.org/jira/browse/HBASE-4191
> Project: HBase
> Issue Type: New Feature
> Reporter: Ted Yu
> Assignee: Liyin Tang
>
> Previously, HBASE-4114 implements the metrics for HFile HDFS block locality,
> which provides the HFile level locality information.
> But in order to work with load balancer and region assignment, we need the
> region level locality information.
> Let's define the region locality information first, which is almost the same
> as HFile locality index.
> HRegion locality index (HRegion A, RegionServer B) =
> (Total number of HDFS blocks that can be retrieved locally by the
> RegionServer B for the HRegion A) / ( Total number of the HDFS blocks for the
> Region A)
> So the HRegion locality index tells us that how much locality we can get if
> the HMaster assign the HRegion A to the RegionServer B.
> So there will be 2 steps involved to assign regions based on the locality.
> 1) During the cluster start up time, the master will scan the hdfs to
> calculate the "HRegion locality index" for each pair of HRegion and Region
> Server. It is pretty expensive to scan the dfs. So we only needs to do this
> once during the start up time.
> 2) During the cluster run time, each region server will update the "HRegion
> locality index" as metrics periodically as HBASE-4114 did. The Region Server
> can expose them to the Master through ZK, meta table, or just RPC messages.
> Based on the "HRegion locality index", the assignment manager in the master
> would have a global knowledge about the region locality distribution. Imaging
> the "HRegion locality index" as the capacity between the region server set
> and region set, the assignment manager could the run the MAXIMUM FLOW solver
> to reach the global optimization.
> Also the master should share this global view to secondary master in case the
> master fail over happens.
> In addition, the HBASE-4491 (Locality Checker) is the tool, which is based on
> the same metrics, to proactively to scan dfs to calculate the global locality
> information in the cluster. It will help us to verify data locality
> information during the run time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira