[ 
https://issues.apache.org/jira/browse/HBASE-26281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417209#comment-17417209
 ] 

Hudson commented on HBASE-26281:
--------------------------------

Results for branch branch-2
        [build #347 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/347/]:
 (/) *{color:green}+1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/347/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/347/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/347/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/347/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> DBB got from BucketCache would be freed unexpectedly before RPC completed
> -------------------------------------------------------------------------
>
>                 Key: HBASE-26281
>                 URL: https://issues.apache.org/jira/browse/HBASE-26281
>             Project: HBase
>          Issue Type: Bug
>          Components: BucketCache
>    Affects Versions: 3.0.0-alpha-1, 2.4.6
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Critical
>             Fix For: 2.5.0, 3.0.0-alpha-2, 2.3.7, 2.4.7
>
>
> {{DBB}} got from BucketCache would be freed unexpectedly before RPC completed 
> for following two cases:
> * Case 1: Replacing Block and getting Block execute concurrently.
>         Because of splitting or stream read,two or more read threads may try 
> to cache the same {{Block}} concurrently and one thread read all of the next 
> block header could replace  the another one didn't(see HBASE-20447).  When 
> replacing {{Block}},the following 
> {{BucketCache.WriterThread.putIntoBackingMap}} method invokes 
> {{BucketCache.blockEvicted}} in line 916,which free the {{BucketEntry}} 
> directly in line 544, not through the {{RefCount}}.
> {code:java}
> 912  private void putIntoBackingMap(BlockCacheKey key, BucketEntry 
> bucketEntry) {
> 913      BucketEntry previousEntry = backingMap.put(key, bucketEntry);
> 914      if (previousEntry != null && previousEntry != bucketEntry) {
> 915        previousEntry.withWriteLock(offsetLock, () -> {
> 916          blockEvicted(key, previousEntry, false);
> 917          return null;
> 918        });
> 919      }
> 920    }
> 543  void blockEvicted(BlockCacheKey cacheKey, BucketEntry bucketEntry, 
> boolean decrementBlockNumber) {
> 544     bucketAllocator.freeBlock(bucketEntry.offset());
> 545     realCacheSize.add(-1 * bucketEntry.getLength());
> 546     blocksByHFile.remove(cacheKey);
> 547     if (decrementBlockNumber) {
> 548         this.blockNumber.decrement();
> 549      }
> 550  }
>   
> {code}
> Freeing the {{BucketEntry}} directly may cause {{HFileBlock}} returned from 
> {{BucketCache.getBlock}} be freed unexpectedly before RPC 
> completed.HBASE-26155 fixed this problem to some extent, but seems not 
> completely. The  {{BucketCache.getBlock}} might be invoked just before 
> {{BucketCache.cacheBlockWithWaitInternal}} and after the 
> {{BucketEntry.isRpcRef}} checking is done. Considering the following sequence:
>      1. Block1 was cached successfully,the {{RefCnt}} of Block1 is 1.
>      2. Thread1 caching the same {{BlockCacheKey}} with Block2 satisfied 
> {{BlockCacheUtil.shouldReplaceExistingCacheBlock}}, so Block2 would replace 
> Block1, but thread1 stopping before {{BucketCache.cacheBlockWithWaitInternal}}
>      3. Thread2 invoking {{BucketCache.getBlock}} with the  same 
> {{BlockCacheKey}}, which returned Block1, the {{RefCnt}} of Block1 is 2.
>      4. Thread1 continues caching Block2, in  
> {{BucketCache.WriterThread.putIntoBackingMap}} , the old Block1 is freed 
> directly which {{RefCnt}}  is 2, but the Block1 is still used by Thread2 and 
> the content of Block1 would be overwitten  after it is freed, which may cause 
> a serious error.
> * Case 2:  Evicting Block , caching Block and getting Block execute 
> concurrently.
>       Considering the following sequence:
>       1. Thread1 caching Block1, stopping after 
> {{WriterThread.putIntoBackingMap}}, the {{RefCnt}} of Block1 is 1.
>       2. Thread2 invoking {{BucketCache.evictBlock}} with the same 
> {{BlockCacheKey}} , but stopping after {{BucketCache.removeFromRamCache}}.
>       3. Thread3 invoking {{BucketCache.getBlock}} with the  same 
> {{BlockCacheKey}}, which returned Block1, the {{RefCnt}} of Block1 is 2.
>       4. Thread1 continues caching block1,but finding that 
> {{BucketCache.RAMCache.remove}} returning false, so invoking 
> {{BucketCache.blockEvicted}}  to free the the Block1 directly 
>          which {{RefCnt}} is 2 and the Block1 is still used by Thread3. 
>       Case 2 is caused by following line 1014 in {{WriteThread.doDrain}}, 
> which also free the {{BucketEntry}} directly,just like the Case1.
>      {code:java}
>      1004   boolean existed = ramCache.remove(key, re -> {
>      1005    if (re != null) {
>      1006       heapSize.add(-1 * re.getData().heapSize());
>      1007     }
>      1008  });
>      1009  if (!existed && bucketEntries[i] != null) {
>      1010     // Block should have already been evicted. Remove it and free 
> space.
>      1011     final BucketEntry bucketEntry = bucketEntries[i];
>      1012    bucketEntry.withWriteLock(offsetLock, () -> {
>      1013       if (backingMap.remove(key, bucketEntry)) {
>      1014         blockEvicted(key, bucketEntry, false);
>      1015       }
>      1016       return null;
>      1017     });
>      1018   }
>      1019  }
>      {code}
> My Fix in the PR is as following:
> # We never free the {{BucketEntry}} directly, freeing the  {{BucketEntry}}  
> must through {{RefCnt}}.
> # {{RefCnt#recycler}} just freeing the corresponding  {{BucketEntry}}  , not 
> including removing the Block from {{BucketCache.backingMap}}  or 
> {{BucketCache.ramCache}}.
>    Removing the Block from {{BucketCache.backingMap}}  or 
> {{BucketCache.ramCache}} is immediately done in {{BucketCache.evictBlock}},  
> and in fact this behavior is the same as  
>    {{BucketCache.evictBlock}} in branch-1, except that freeing 
> {{BucketEntry}} is delayed until  {{RefCnt}} becoming 0.
>    In current branch-2 implementation, removing the Block from 
> {{BucketCache.backingMap}}  or {{BucketCache.ramCache}} is done in 
> {{RefCnt#recycler}}, which is something 
>    unreasonable:  we decreasing the  {{RefCnt}}  because we removing it from  
> {{BucketCache.backingMap}}, not in reverse that because the  {{RefCnt}} 
> decreased to zero we  
>    removing it from  {{BucketCache.backingMap}}, which may causes the newly 
> added Block with the same {{BlockCacheKey}} be incorrectly evicted from 
> {{BucketCache.backingMap}}
>    or  {{BucketCache.ramCache}}.
>   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to