[ 
https://issues.apache.org/jira/browse/HBASE-30240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090376#comment-18090376
 ] 

Duo Zhang commented on HBASE-30240:
-----------------------------------

OK, I think the problem is in the implementation of 
BucketCache.notifyFileCachingCompleted.

I've changed the code like this

{code}
// BucketCache would only have data blocks
      if (dataBlockCount == count) {
        LOG.debug("File {} has now been fully cached.", fileName);
        fileCacheCompleted(fileName, size);
      } else {
        LOG.debug(
          "Prefetch executor completed for {}, but only {} data blocks were 
cached. "
            + "Total data blocks for file: {}. "
            + "Checking for blocks pending cache in cache writer queue.",
          fileName, count, dataBlockCount);
        if (ramCache.hasBlocksForFile(fileName.getName())) {
          for (ReentrantReadWriteLock lock : locks) {
            lock.readLock().unlock();
          }
          locks.clear();
          LOG.debug("There are still blocks pending caching for file {}. Will 
sleep 100ms "
            + "and try the verification again.", fileName);
          Thread.sleep(100);
          notifyFileCachingCompleted(fileName, totalBlockCount, dataBlockCount, 
size);
        } else {
          LOG.info(
            "The total block count was {}. We found only {} data blocks cached 
from "
              + "a total of {} data blocks for file {}, "
              + "but no blocks pending caching. Maybe cache is full or 
evictions "
              + "happened concurrently to cache prefetch.",
            totalBlockCount, count, dataBlockCount, fileName);
          Thread.sleep(1000);
          LOG.info("count===========" + result.stream()
            .filter(e -> backingMap.containsKey(e) && 
e.getBlockType().isData()).count());
        }
      }
{code}

This means when we reached the block count mismatch, I will sleep one second 
and then count again.

This is what I got from the output

{noformat}
2026-06-21T00:03:32,789 DEBUG [hfile-prefetch-1781971412716] 
bucket.BucketCache(2445): Prefetch executor completed for 
/home/zhangduo/hbase/code/hbase-server/target/test-data/3fb83829-b233-64c4-48a3-0a0e703dcc3a/TestPrefetch0/ec3cb3debd014bd192a5bcd4ca20d501,
 but only 55 data blocks were cached. Total data blocks for file: 56. Checking 
for blocks pending cache in cache writer queue.
2026-06-21T00:03:32,789 INFO  [hfile-prefetch-1781971412716] 
bucket.BucketCache(2460): The total block count was 57. We found only 55 data 
blocks cached from a total of 56 data blocks for file 
/home/zhangduo/hbase/code/hbase-server/target/test-data/3fb83829-b233-64c4-48a3-0a0e703dcc3a/TestPrefetch0/ec3cb3debd014bd192a5bcd4ca20d501,
 but no blocks pending caching. Maybe cache is full or evictions happened 
concurrently to cache prefetch.
2026-06-21T00:03:33,272 DEBUG [bucket-cache-persister] 
bucket.BucketCachePersister(47): Cache is inconsistent, persisting to disk
2026-06-21T00:03:33,273 DEBUG [bucket-cache-persister] 
bucket.BucketCache(1554): Thread bucket-cache-persister started persisting 
bucket cache to file
2026-06-21T00:03:33,273 DEBUG [bucket-cache-persister] 
bucket.BucketCache(1561): Persist in new chunked persistence format.
2026-06-21T00:03:33,273 DEBUG [bucket-cache-persister] 
bucket.BucketCache(1744): persistToFile: before persisting backing map size: 
57, fullycachedFiles size: 0, chunkSize: 10000
2026-06-21T00:03:33,351 DEBUG [bucket-cache-persister] 
bucket.BucketCache(1751): persistToFile: after persisting backing map size: 57, 
fullycachedFiles size: 0
2026-06-21T00:03:33,351 DEBUG [bucket-cache-persister] 
bucket.BucketCache(1565): PersistToFile: after persisting backing map size: 57, 
fullycachedFiles size: 0, file name: bucket.persistence1781971413273
2026-06-21T00:03:33,351 DEBUG [bucket-cache-persister] 
bucket.BucketCache(1576): Thread bucket-cache-persister finished persisting 
bucket cache to file, renaming
2026-06-21T00:03:33,790 INFO  [hfile-prefetch-1781971412716] 
bucket.BucketCache(2467): count===========56
2026-06-21T00:03:33,790 DEBUG [hfile-prefetch-1781971412716] 
hfile.PrefetchExecutor(121): Prefetch completed for 
ec3cb3debd014bd192a5bcd4ca20d501
{noformat}

Obviously, after sleeping, we get the correct data block number 56...

I think the problem is very simple. We first check backingMap, if the data 
block is not enough, we check ramCache, but at the same time, we are still 
moving data from ramCache to backingMap, so it could happen that when checking 
backingMap, the key does not exist but when we check ramCache, the key has 
already been moved to backingMap...

> TestPrefetchPersistence fails because of a file is not fully cached
> -------------------------------------------------------------------
>
>                 Key: HBASE-30240
>                 URL: https://issues.apache.org/jira/browse/HBASE-30240
>             Project: HBase
>          Issue Type: Sub-task
>          Components: BucketCache, test
>            Reporter: Duo Zhang
>            Priority: Major
>         Attachments: out
>
>
> {noformat}
> 2026-06-20T23:44:49,592 DEBUG [hfile-prefetch-1781970289521] 
> bucket.BucketCache(2445): Prefetch executor completed for 
> /home/zhangduo/hbase/code/hbase-server/target/test-data/93a585a9-e5a9-d554-3df6-1ac14eabc2ff/TestPrefetch0/6de8b2415ccc430db423ae697b3727bf,
>  but only 53 data blocks were cached. Total data blocks for file: 55. 
> Checking for blocks pending cache in cache writer queue.
> 2026-06-20T23:44:49,592 INFO  [hfile-prefetch-1781970289521] 
> bucket.BucketCache(2460): The total block count was 56. We found only 53 data 
> blocks cached from a total of 55 data blocks for file 
> /home/zhangduo/hbase/code/hbase-server/target/test-data/93a585a9-e5a9-d554-3df6-1ac14eabc2ff/TestPrefetch0/6de8b2415ccc430db423ae697b3727bf,
>  but no blocks pending caching. Maybe cache is full or evictions happened 
> concurrently to cache prefetch.
> 2026-06-20T23:44:49,593 DEBUG [hfile-prefetch-1781970289521] 
> hfile.PrefetchExecutor(121): Prefetch completed for 
> 6de8b2415ccc430db423ae697b3727bf
> {noformat}
> Found this in the test output, this means the file is not fully cached, we 
> missed some data block in BucketCache, which fails the test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to