[jira] [Commented] (HBASE-11695) PeriodicFlusher and WakeFrequency issues

Lars Hofhansl (JIRA) Thu, 07 Aug 2014 15:07:43 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089923#comment-14089923
 ]


Lars Hofhansl commented on HBASE-11695:
---------------------------------------

Can't try on the production site easily right now.
This was a dud anyway, in the sense that it does not cause the issue. It's just 
a weirdness observed. The problem was that flushes took a very long time 
(hours); not sure why, yet, but probably due to a networking issue. Hence all 
flushes were waiting and after one hour all the waiting regions become eligible 
for the periodic flusher.

The problem here is only a cosmetic problem then. Because the wake waittime is 
less than the jitter in most of the cases we'll see each region requesting a 
flush twice.

> PeriodicFlusher and WakeFrequency issues
> ----------------------------------------
>
>                 Key: HBASE-11695
>                 URL: https://issues.apache.org/jira/browse/HBASE-11695
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.21
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Critical
>             Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6
>
>         Attachments: 11695-trunk.txt
>
>
> We just ran into a flush storm caused by the PeriodicFlusher.
> Many memstore became eligible for flushing at exactly the same time, the 
> effect we've seen is that the exact same region was flushed multiple times, 
> because the flusher wakes up too often (every 10s). The jitter of 20s is 
> larger than that and it takes some time to actually flush the memstore.
> Here's one example. We've seen 100's of these, monopolizing the flush queue 
> and preventing "important" flushes from happening.
> {code}
> 06-Aug-2014 20:11:56  [regionserver60020.periodicFlusher] INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- 
> regionserver60020.periodicFlusher requesting flush for region 
> tsdb,\x00\x00\x0AO\xCF* 
> \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2.
>  after a delay of 13449
> 06-Aug-2014 20:12:06  [regionserver60020.periodicFlusher] INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- 
> regionserver60020.periodicFlusher requesting flush for region 
> tsdb,\x00\x00\x0AO\xCF* 
> \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2.
>  after a delay of 14060
> {code}
> So we need to increase the period of the PeriodicFlusher to at least the 
> random jitter, also increase the default random jitter (20s does not help 
> with many regions).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11695) PeriodicFlusher and WakeFrequency issues

Reply via email to