[
https://issues.apache.org/jira/browse/HBASE-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150949#comment-15150949
]
Siddharth Wagle commented on HBASE-15249:
-----------------------------------------
{quote} What does the math look like for region splits {quote}
Ref: AMBARI-13039. We use the _memstore.lowerLimit_ and _memstore.flush.size_
to calculate memory available to the memstore and number of max-value on
regions. Then we calculate lexically equidistant split points based on the
services deployed by Ambari (from a static list of metrics that we mined from a
deployed cluster) for the large tables.
{quote}You need to run normalizer?{quote}
In a stable state it seems normalizer works well for us managing the region
boundaries. We do give user the option to disable this with a configuration
setting in AMS (precautionary tactic from our end). All in all, we can
definitely live without the normalizer this was not available to us until very
recently, the pre-splitting pre-dates normalizer setting in AMS. The best use
case for normalizer use for us is this: Ambari user can lets say add a service
example: KAFKA that starts writing a ton of metrics and introduces a skew where
previous splits become irrelevant.
[~stack] / [~anoop.hbase] Thanks for feedback.
> Provide lower bound on number of regions in region normalizer for pre-split
> tables
> ----------------------------------------------------------------------------------
>
> Key: HBASE-15249
> URL: https://issues.apache.org/jira/browse/HBASE-15249
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: Ted Yu
> Attachments: HBASE-15249.v1.txt, HBASE-15249.v2.txt
>
>
> AMS (Ambari Metrics System) developer found the following scenario:
> Metrics table was pre-split with many regions on large cluster (1600 nodes).
> After some time, AMS stopped working because region normalizer merged the
> regions into few big regions which were not able to serve high read / write
> load.
> This is a big problem since the write requests flood the regions faster than
> the splits can happen resulting in poor performance.
> We should consider setting reasonable lower bound on region count.
> If the table is pre-split, we can use initial region count as the lower bound.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)