[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389046#comment-16389046
 ] 

Ted Yu commented on HBASE-20090:
--------------------------------

w.r.t. getBiggestMemstoreRegion(), there is 3rd parameter:
{code}
      boolean checkStoreFileCount) {
{code}
which is different for the two invocations.
I don't think the first invocation can add region to excludedRegions simply 
because the region doesn't pass the check in current call.
It seems more refactoring is needed to achieve what Ram suggested above.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-20090
>                 URL: https://issues.apache.org/jira/browse/HBASE-20090
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Major
>         Attachments: 20090-server-61260-01-000007.log, 20090.v6.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0),     split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info      size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global    
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -    trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-000007.log :
> {code}
>       synchronized (writestate) {
>         if (!writestate.flushing && writestate.writesEnabled) {
>           this.writestate.flushing = true;
>         } else {
>           if (LOG.isDebugEnabled()) {
>             LOG.debug("NOT flushing memstore for region " + this
>                 + ", flushing=" + writestate.flushing + ", writesEnabled="
>                 + writestate.writesEnabled);
>           }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>       HRegion bestFlushableRegion =
>           getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to