[
https://issues.apache.org/jira/browse/HBASE-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330967#comment-15330967
]
Enis Soztutar commented on HBASE-16030:
---------------------------------------
We should already be doing jitter in PeriodicMemstoreFlusher:
{code}
if (((HRegion)r).shouldFlush(whyFlush)) {
FlushRequester requester = server.getFlushRequester();
if (requester != null) {
long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) +
MIN_DELAY_TIME;
LOG.info(getName() + " requesting flush of " +
r.getRegionInfo().getRegionNameAsString() + " because " +
whyFlush.toString() +
" after random delay " + randomDelay + "ms");
//Throttle the flushes by putting a delay. If we don't throttle,
and there
//is a balanced write-load on the regions in a table, we might end
up
//overwhelming the filesystem with too many flushes at once.
requester.requestDelayedFlush(r, randomDelay, false);
}
}
{code}
You mean the delayed flush with jitter is not working? Range of delay is 5
mins, so 2.5min jitter is not enough?
> All Regions are flushed at about same time when MEMSTORE_PERIODIC_FLUSH is
> on, causing flush spike
> --------------------------------------------------------------------------------------------------
>
> Key: HBASE-16030
> URL: https://issues.apache.org/jira/browse/HBASE-16030
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 1.2.1
> Reporter: Tianying Chang
> Assignee: Tianying Chang
> Fix For: 1.2.1
>
> Attachments: hbase-16030.patch
>
>
> In our production cluster, we observed that memstore flush spike every hour
> for all regions/RS. (we use the default memstore periodic flush time of 1
> hour).
> This will happend when two conditions are met:
> 1. the memstore does not have enough data to be flushed before 1 hour limit
> reached;
> 2. all regions are opened around the same time, (e.g. all RS are started at
> the same time when start a cluster).
> With above two conditions, all the regions will be flushed around the same
> time at: startTime+1hour-delay again and again.
> We added a flush jittering time to randomize the flush time of each region,
> so that they don't get flushed at around the same time. We had this feature
> running in our 94.7 and 94.26 cluster. Recently, we upgrade to 1.2, found
> this issue still there in 1.2. So we are porting this into 1.2 branch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)