[ 
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340153#comment-15340153
 ] 

Enis Soztutar commented on HBASE-16030:
---------------------------------------

The config option {{hbase.regionserver.flush.delay.jitter}} looks like it is 
for generic flushes (as opposed to periodic memstore flush). We should find a 
more descriptive name. We control the periodic memstore flush interval with 
{{hbase.regionserver.optionalcacheflushinterval}}, so I would suggest that new 
config should be named something like 
{{hbase.regionserver.optionalcacheflush.delay.jitter}}. 

Why are we doing 5 min as min value. It makes sense to have a range of 0-30min, 
rather than 5-35 min I think. 

bq. Question, by increasing the flush delay time, the flush request will stay 
in the queue for 30 minutes. Will this cause any issue?
This is a very good question. Looking at the MemstoreFlusher, there are a 
couple of cases where we will trigger a flush request. If region is more than 
flush size (128MB), we call MemstoreFlusher.requestFlush() which DOES NOT 
requeue the flush request. Only if there is global memstore pressure, we are 
calling flushRegion() directly with {{emergencyFlush=true}}. This means, if the 
periodic flusher queues a flush with 30 min delay, but then the region suddenly 
gets more load and grows more than flush size, the flush WILL NOT happen for 
another 30 mins. So with this patch, we are increasing the likelihood of a 
stall happening.   


> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is 
> on, causing flush spike
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-16030
>                 URL: https://issues.apache.org/jira/browse/HBASE-16030
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.2.1
>            Reporter: Tianying Chang
>            Assignee: Tianying Chang
>             Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.3
>
>         Attachments: Screen Shot 2016-06-15 at 11.35.42 PM.png, Screen Shot 
> 2016-06-15 at 11.52.38 PM.png, hbase-16030-v2.patch, hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour 
> for all regions/RS. (we use the default memstore periodic flush time of 1 
> hour). 
> This will happend when two conditions are met: 
> 1. the memstore does not have enough data to be flushed before 1 hour limit 
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at 
> the same time when start a cluster). 
> With above two conditions, all the regions will be flushed around the same 
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region, 
> so that they don't get flushed at around the same time. We had this feature 
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found 
> this issue still there in 1.2. So we are porting this into 1.2 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to