[ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340748#comment-15340748
 ] 

Enis Soztutar commented on HBASE-16030:
---------------------------------------

bq.  It sounds also a problem if flush is stalled for 5 mins (although not as 
bad as 30 mins)
Indeed. 
bq. Feels like the right way of jittering should not be putting into a blocking 
queue ahead of time. What do you think?
I think we can still jitter for more, but we have to solve the underlying 
problem first. The correct solution might be to re-queue the request in regular 
flush request with no-delay. 

> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-16030
>                 URL: https://issues.apache.org/jira/browse/HBASE-16030
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.2.1
>            Reporter: Tianying Chang
>            Assignee: Tianying Chang
>             Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
>         Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot 
> 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to