[
https://issues.apache.org/jira/browse/HBASE-26281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417209#comment-17417209
]
Hudson commented on HBASE-26281:
--------------------------------
Results for branch branch-2
[build #347 on
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/347/]:
(/) *{color:green}+1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/347/General_20Nightly_20Build_20Report/]
(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2)
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/347/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]
(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3)
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/347/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/347/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test{color}
> DBB got from BucketCache would be freed unexpectedly before RPC completed
> -------------------------------------------------------------------------
>
> Key: HBASE-26281
> URL: https://issues.apache.org/jira/browse/HBASE-26281
> Project: HBase
> Issue Type: Bug
> Components: BucketCache
> Affects Versions: 3.0.0-alpha-1, 2.4.6
> Reporter: chenglei
> Assignee: chenglei
> Priority: Critical
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.3.7, 2.4.7
>
>
> {{DBB}} got from BucketCache would be freed unexpectedly before RPC completed
> for following two cases:
> * Case 1: Replacing Block and getting Block execute concurrently.
> Because of splitting or stream read,two or more read threads may try
> to cache the same {{Block}} concurrently and one thread read all of the next
> block header could replace the another one didn't(see HBASE-20447). When
> replacing {{Block}},the following
> {{BucketCache.WriterThread.putIntoBackingMap}} method invokes
> {{BucketCache.blockEvicted}} in line 916,which free the {{BucketEntry}}
> directly in line 544, not through the {{RefCount}}.
> {code:java}
> 912 private void putIntoBackingMap(BlockCacheKey key, BucketEntry
> bucketEntry) {
> 913 BucketEntry previousEntry = backingMap.put(key, bucketEntry);
> 914 if (previousEntry != null && previousEntry != bucketEntry) {
> 915 previousEntry.withWriteLock(offsetLock, () -> {
> 916 blockEvicted(key, previousEntry, false);
> 917 return null;
> 918 });
> 919 }
> 920 }
> 543 void blockEvicted(BlockCacheKey cacheKey, BucketEntry bucketEntry,
> boolean decrementBlockNumber) {
> 544 bucketAllocator.freeBlock(bucketEntry.offset());
> 545 realCacheSize.add(-1 * bucketEntry.getLength());
> 546 blocksByHFile.remove(cacheKey);
> 547 if (decrementBlockNumber) {
> 548 this.blockNumber.decrement();
> 549 }
> 550 }
>
> {code}
> Freeing the {{BucketEntry}} directly may cause {{HFileBlock}} returned from
> {{BucketCache.getBlock}} be freed unexpectedly before RPC
> completed.HBASE-26155 fixed this problem to some extent, but seems not
> completely. The {{BucketCache.getBlock}} might be invoked just before
> {{BucketCache.cacheBlockWithWaitInternal}} and after the
> {{BucketEntry.isRpcRef}} checking is done. Considering the following sequence:
> 1. Block1 was cached successfully,the {{RefCnt}} of Block1 is 1.
> 2. Thread1 caching the same {{BlockCacheKey}} with Block2 satisfied
> {{BlockCacheUtil.shouldReplaceExistingCacheBlock}}, so Block2 would replace
> Block1, but thread1 stopping before {{BucketCache.cacheBlockWithWaitInternal}}
> 3. Thread2 invoking {{BucketCache.getBlock}} with the same
> {{BlockCacheKey}}, which returned Block1, the {{RefCnt}} of Block1 is 2.
> 4. Thread1 continues caching Block2, in
> {{BucketCache.WriterThread.putIntoBackingMap}} , the old Block1 is freed
> directly which {{RefCnt}} is 2, but the Block1 is still used by Thread2 and
> the content of Block1 would be overwitten after it is freed, which may cause
> a serious error.
> * Case 2: Evicting Block , caching Block and getting Block execute
> concurrently.
> Considering the following sequence:
> 1. Thread1 caching Block1, stopping after
> {{WriterThread.putIntoBackingMap}}, the {{RefCnt}} of Block1 is 1.
> 2. Thread2 invoking {{BucketCache.evictBlock}} with the same
> {{BlockCacheKey}} , but stopping after {{BucketCache.removeFromRamCache}}.
> 3. Thread3 invoking {{BucketCache.getBlock}} with the same
> {{BlockCacheKey}}, which returned Block1, the {{RefCnt}} of Block1 is 2.
> 4. Thread1 continues caching block1,but finding that
> {{BucketCache.RAMCache.remove}} returning false, so invoking
> {{BucketCache.blockEvicted}} to free the the Block1 directly
> which {{RefCnt}} is 2 and the Block1 is still used by Thread3.
> Case 2 is caused by following line 1014 in {{WriteThread.doDrain}},
> which also free the {{BucketEntry}} directly,just like the Case1.
> {code:java}
> 1004 boolean existed = ramCache.remove(key, re -> {
> 1005 if (re != null) {
> 1006 heapSize.add(-1 * re.getData().heapSize());
> 1007 }
> 1008 });
> 1009 if (!existed && bucketEntries[i] != null) {
> 1010 // Block should have already been evicted. Remove it and free
> space.
> 1011 final BucketEntry bucketEntry = bucketEntries[i];
> 1012 bucketEntry.withWriteLock(offsetLock, () -> {
> 1013 if (backingMap.remove(key, bucketEntry)) {
> 1014 blockEvicted(key, bucketEntry, false);
> 1015 }
> 1016 return null;
> 1017 });
> 1018 }
> 1019 }
> {code}
> My Fix in the PR is as following:
> # We never free the {{BucketEntry}} directly, freeing the {{BucketEntry}}
> must through {{RefCnt}}.
> # {{RefCnt#recycler}} just freeing the corresponding {{BucketEntry}} , not
> including removing the Block from {{BucketCache.backingMap}} or
> {{BucketCache.ramCache}}.
> Removing the Block from {{BucketCache.backingMap}} or
> {{BucketCache.ramCache}} is immediately done in {{BucketCache.evictBlock}},
> and in fact this behavior is the same as
> {{BucketCache.evictBlock}} in branch-1, except that freeing
> {{BucketEntry}} is delayed until {{RefCnt}} becoming 0.
> In current branch-2 implementation, removing the Block from
> {{BucketCache.backingMap}} or {{BucketCache.ramCache}} is done in
> {{RefCnt#recycler}}, which is something
> unreasonable: we decreasing the {{RefCnt}} because we removing it from
> {{BucketCache.backingMap}}, not in reverse that because the {{RefCnt}}
> decreased to zero we
> removing it from {{BucketCache.backingMap}}, which may causes the newly
> added Block with the same {{BlockCacheKey}} be incorrectly evicted from
> {{BucketCache.backingMap}}
> or {{BucketCache.ramCache}}.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)