[
https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089923#comment-14089923
]
Lars Hofhansl commented on HBASE-11695:
---------------------------------------
Can't try on the production site easily right now.
This was a dud anyway, in the sense that it does not cause the issue. It's just
a weirdness observed. The problem was that flushes took a very long time
(hours); not sure why, yet, but probably due to a networking issue. Hence all
flushes were waiting and after one hour all the waiting regions become eligible
for the periodic flusher.
The problem here is only a cosmetic problem then. Because the wake waittime is
less than the jitter in most of the cases we'll see each region requesting a
flush twice.
> PeriodicFlusher and WakeFrequency issues
> ----------------------------------------
>
> Key: HBASE-11695
> URL: https://issues.apache.org/jira/browse/HBASE-11695
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.21
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Priority: Critical
> Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6
>
> Attachments: 11695-trunk.txt
>
>
> We just ran into a flush storm caused by the PeriodicFlusher.
> Many memstore became eligible for flushing at exactly the same time, the
> effect we've seen is that the exact same region was flushed multiple times,
> because the flusher wakes up too often (every 10s). The jitter of 20s is
> larger than that and it takes some time to actually flush the memstore.
> Here's one example. We've seen 100's of these, monopolizing the flush queue
> and preventing "important" flushes from happening.
> {code}
> 06-Aug-2014 20:11:56 [regionserver60020.periodicFlusher] INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer[1397]--
> regionserver60020.periodicFlusher requesting flush for region
> tsdb,\x00\x00\x0AO\xCF*
> \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2.
> after a delay of 13449
> 06-Aug-2014 20:12:06 [regionserver60020.periodicFlusher] INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer[1397]--
> regionserver60020.periodicFlusher requesting flush for region
> tsdb,\x00\x00\x0AO\xCF*
> \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2.
> after a delay of 14060
> {code}
> So we need to increase the period of the PeriodicFlusher to at least the
> random jitter, also increase the default random jitter (20s does not help
> with many regions).
--
This message was sent by Atlassian JIRA
(v6.2#6252)