[
https://issues.apache.org/jira/browse/HBASE-30240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090376#comment-18090376
]
Duo Zhang commented on HBASE-30240:
-----------------------------------
OK, I think the problem is in the implementation of
BucketCache.notifyFileCachingCompleted.
I've changed the code like this
{code}
// BucketCache would only have data blocks
if (dataBlockCount == count) {
LOG.debug("File {} has now been fully cached.", fileName);
fileCacheCompleted(fileName, size);
} else {
LOG.debug(
"Prefetch executor completed for {}, but only {} data blocks were
cached. "
+ "Total data blocks for file: {}. "
+ "Checking for blocks pending cache in cache writer queue.",
fileName, count, dataBlockCount);
if (ramCache.hasBlocksForFile(fileName.getName())) {
for (ReentrantReadWriteLock lock : locks) {
lock.readLock().unlock();
}
locks.clear();
LOG.debug("There are still blocks pending caching for file {}. Will
sleep 100ms "
+ "and try the verification again.", fileName);
Thread.sleep(100);
notifyFileCachingCompleted(fileName, totalBlockCount, dataBlockCount,
size);
} else {
LOG.info(
"The total block count was {}. We found only {} data blocks cached
from "
+ "a total of {} data blocks for file {}, "
+ "but no blocks pending caching. Maybe cache is full or
evictions "
+ "happened concurrently to cache prefetch.",
totalBlockCount, count, dataBlockCount, fileName);
Thread.sleep(1000);
LOG.info("count===========" + result.stream()
.filter(e -> backingMap.containsKey(e) &&
e.getBlockType().isData()).count());
}
}
{code}
This means when we reached the block count mismatch, I will sleep one second
and then count again.
This is what I got from the output
{noformat}
2026-06-21T00:03:32,789 DEBUG [hfile-prefetch-1781971412716]
bucket.BucketCache(2445): Prefetch executor completed for
/home/zhangduo/hbase/code/hbase-server/target/test-data/3fb83829-b233-64c4-48a3-0a0e703dcc3a/TestPrefetch0/ec3cb3debd014bd192a5bcd4ca20d501,
but only 55 data blocks were cached. Total data blocks for file: 56. Checking
for blocks pending cache in cache writer queue.
2026-06-21T00:03:32,789 INFO [hfile-prefetch-1781971412716]
bucket.BucketCache(2460): The total block count was 57. We found only 55 data
blocks cached from a total of 56 data blocks for file
/home/zhangduo/hbase/code/hbase-server/target/test-data/3fb83829-b233-64c4-48a3-0a0e703dcc3a/TestPrefetch0/ec3cb3debd014bd192a5bcd4ca20d501,
but no blocks pending caching. Maybe cache is full or evictions happened
concurrently to cache prefetch.
2026-06-21T00:03:33,272 DEBUG [bucket-cache-persister]
bucket.BucketCachePersister(47): Cache is inconsistent, persisting to disk
2026-06-21T00:03:33,273 DEBUG [bucket-cache-persister]
bucket.BucketCache(1554): Thread bucket-cache-persister started persisting
bucket cache to file
2026-06-21T00:03:33,273 DEBUG [bucket-cache-persister]
bucket.BucketCache(1561): Persist in new chunked persistence format.
2026-06-21T00:03:33,273 DEBUG [bucket-cache-persister]
bucket.BucketCache(1744): persistToFile: before persisting backing map size:
57, fullycachedFiles size: 0, chunkSize: 10000
2026-06-21T00:03:33,351 DEBUG [bucket-cache-persister]
bucket.BucketCache(1751): persistToFile: after persisting backing map size: 57,
fullycachedFiles size: 0
2026-06-21T00:03:33,351 DEBUG [bucket-cache-persister]
bucket.BucketCache(1565): PersistToFile: after persisting backing map size: 57,
fullycachedFiles size: 0, file name: bucket.persistence1781971413273
2026-06-21T00:03:33,351 DEBUG [bucket-cache-persister]
bucket.BucketCache(1576): Thread bucket-cache-persister finished persisting
bucket cache to file, renaming
2026-06-21T00:03:33,790 INFO [hfile-prefetch-1781971412716]
bucket.BucketCache(2467): count===========56
2026-06-21T00:03:33,790 DEBUG [hfile-prefetch-1781971412716]
hfile.PrefetchExecutor(121): Prefetch completed for
ec3cb3debd014bd192a5bcd4ca20d501
{noformat}
Obviously, after sleeping, we get the correct data block number 56...
I think the problem is very simple. We first check backingMap, if the data
block is not enough, we check ramCache, but at the same time, we are still
moving data from ramCache to backingMap, so it could happen that when checking
backingMap, the key does not exist but when we check ramCache, the key has
already been moved to backingMap...
> TestPrefetchPersistence fails because of a file is not fully cached
> -------------------------------------------------------------------
>
> Key: HBASE-30240
> URL: https://issues.apache.org/jira/browse/HBASE-30240
> Project: HBase
> Issue Type: Sub-task
> Components: BucketCache, test
> Reporter: Duo Zhang
> Priority: Major
> Attachments: out
>
>
> {noformat}
> 2026-06-20T23:44:49,592 DEBUG [hfile-prefetch-1781970289521]
> bucket.BucketCache(2445): Prefetch executor completed for
> /home/zhangduo/hbase/code/hbase-server/target/test-data/93a585a9-e5a9-d554-3df6-1ac14eabc2ff/TestPrefetch0/6de8b2415ccc430db423ae697b3727bf,
> but only 53 data blocks were cached. Total data blocks for file: 55.
> Checking for blocks pending cache in cache writer queue.
> 2026-06-20T23:44:49,592 INFO [hfile-prefetch-1781970289521]
> bucket.BucketCache(2460): The total block count was 56. We found only 53 data
> blocks cached from a total of 55 data blocks for file
> /home/zhangduo/hbase/code/hbase-server/target/test-data/93a585a9-e5a9-d554-3df6-1ac14eabc2ff/TestPrefetch0/6de8b2415ccc430db423ae697b3727bf,
> but no blocks pending caching. Maybe cache is full or evictions happened
> concurrently to cache prefetch.
> 2026-06-20T23:44:49,593 DEBUG [hfile-prefetch-1781970289521]
> hfile.PrefetchExecutor(121): Prefetch completed for
> 6de8b2415ccc430db423ae697b3727bf
> {noformat}
> Found this in the test output, this means the file is not fully cached, we
> missed some data block in BucketCache, which fails the test.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)