[
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407425#comment-16407425
]
stack edited comment on HBASE-20090 at 3/21/18 3:56 AM:
--------------------------------------------------------
Now I get it. Thanks. Let me backport to branch-2.0... After it goes in
elsewhere.
was (Author: stack):
Now I get it. Thanks. Let me backport...
> Properly handle Preconditions check failure in
> MemStoreFlusher$FlushHandler.run
> -------------------------------------------------------------------------------
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: Ted Yu
> Priority: Major
> Attachments: 20090-server-61260-01-000007.log, 20090.v10.txt,
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the
> race condition.
> The original description was merged to the beginning of my first comment
> below.
> Observed the following in region server log (running on hadoop3 cluster):
> {code}
> 2018-02-26 16:06:50,044 WARN
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020]
> regionserver.MemStoreFlusher: Memstore is above high water mark and block
> 135ms
> 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1]
> regionserver.MemStoreFlusher: Cache flusher failed for entry
> org.apache.hadoop.hbase.regionserver.
> MemStoreFlusher$WakeupFlushThread@2adfadd7
> java.lang.IllegalStateException
> at
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() :
> {code}
> Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> bestRegionReplicaSize > 0);
> {code}
> When the Preconditions check fails, IllegalStateException would be raised.
> With more debug logging, we can see the scenario where the exception was
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit:
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085.,
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020]
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because
> info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020]
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1]
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020]
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1]
> regionserver.MemStoreFlusher: region
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1]
> regionserver.MemStoreFlusher: region
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1]
> regionserver.MemStoreFlusher: Flush of region
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
> heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1]
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1]
> regionserver.MemStoreFlusher: Nothing to flush for
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1]
> regionserver.MemStoreFlusher: Excluding unflushable region
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. - trying to
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in
> 20090-server-61260-01-000007.log :
> {code}
> synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
> this.writestate.flushing = true;
> } else {
> if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
> }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a
> flush candidate:
> {code}
> HRegion bestFlushableRegion =
> getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region,
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined
> next. Since the region was not receiving write, the (current) Precondition
> check failed.
> The proposed fix is to convert the Precondition to normal return.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)