[ 
https://issues.apache.org/jira/browse/HBASE-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150949#comment-15150949
 ] 

Siddharth Wagle commented on HBASE-15249:
-----------------------------------------

{quote} What does the math look like for region splits {quote}
Ref: AMBARI-13039. We use the _memstore.lowerLimit_ and _memstore.flush.size_ 
to calculate memory available to the memstore and number of max-value on 
regions. Then we calculate lexically equidistant split points based on the 
services deployed by Ambari (from a static list of metrics that we mined from a 
deployed cluster) for the large tables.

{quote}You need to run normalizer?{quote}
In a stable state it seems normalizer works well for us managing the region 
boundaries. We do give user the option to disable this with a configuration 
setting in AMS (precautionary tactic from our end). All in all, we can 
definitely live without the normalizer this was not available to us until very 
recently, the pre-splitting pre-dates normalizer setting in AMS. The best use 
case for normalizer use for us is this: Ambari user can lets say add a service 
example: KAFKA that starts writing a ton of metrics and introduces a skew where 
previous splits become irrelevant.

[~stack] / [~anoop.hbase] Thanks for feedback.

> Provide lower bound on number of regions in region normalizer for pre-split 
> tables
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-15249
>                 URL: https://issues.apache.org/jira/browse/HBASE-15249
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: HBASE-15249.v1.txt, HBASE-15249.v2.txt
>
>
> AMS (Ambari Metrics System) developer found the following scenario:
> Metrics table was pre-split with many regions on large cluster (1600 nodes).
> After some time, AMS stopped working because region normalizer merged the 
> regions into few big regions which were not able to serve high read / write 
> load.
> This is a big problem since the write requests flood the regions faster than 
> the splits can happen resulting in poor performance.
> We should consider setting reasonable lower bound on region count.
> If the table is pre-split, we can use initial region count as the lower bound.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to