[
https://issues.apache.org/jira/browse/HBASE-16169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665872#comment-15665872
]
Thiruvel Thirumoolan commented on HBASE-16169:
----------------------------------------------
Our primary intention is to use this API for RegionSizeCalculator and not rely
on Master for ClusterStatus. On our large clusters, ClusterStatus() alone takes
4-5 mins which is significant for some of the pipelines. And if Master is
down/busy, then some of the jobs timeout/fail. This API helps both the
scenarios and is the primary use case. Hence I also included
RegionSizeCalculator changes as part of this patch.
Other possible uses:
1. If there is a lighter version of GetClusterStatus API (i.e without the
ServerLoad for each RS), then custom maintenance tools can be better. In
current world ClusterStatus is heavy. With the new APIs, each API's payload is
smaller and distributed. So custom tools can call getRegionLoad() when needed,
it will be more accurate. This helps with large clusters. For tools that don't
need RegionLoad, the lighter version of API is fine enough.
2. Another use case is a tool like RSTop - since we can see selective metrics
at RegionLevel (possibly even deltas between each RPC to the server).
Please let us know your thoughts. Our primary intention is to address the delay
in MR jobs and reduce Master dependency.
> Make RegionSizeCalculator scalable
> ----------------------------------
>
> Key: HBASE-16169
> URL: https://issues.apache.org/jira/browse/HBASE-16169
> Project: HBase
> Issue Type: Sub-task
> Components: mapreduce, scaling
> Reporter: Thiruvel Thirumoolan
> Assignee: Thiruvel Thirumoolan
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16169.master.000.patch,
> HBASE-16169.master.001.patch, HBASE-16169.master.002.patch,
> HBASE-16169.master.003.patch, HBASE-16169.master.004.patch,
> HBASE-16169.master.005.patch, HBASE-16169.master.006.patch
>
>
> RegionSizeCalculator is needed for better split generation of MR jobs. This
> requires RegionLoad which can be obtained via ClusterStatus, i.e. accessing
> Master. We don't want master to be in this path.
> The proposal is to add an API to the RegionServer that gets RegionLoad of all
> regions hosted on it or those of a table if specified. RegionSizeCalculator
> can use the latter.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)