[
https://issues.apache.org/jira/browse/HBASE-28170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wellington Chevreuil updated HBASE-28170:
-----------------------------------------
Summary: Put the cached time at the beginning of the block; run cache
validation in the background when retrieving the persistent cache (was: Put
the cached time at the beginning of the block run cache validation in the
background when retrieving the persistent cache)
> Put the cached time at the beginning of the block; run cache validation in
> the background when retrieving the persistent cache
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-28170
> URL: https://issues.apache.org/jira/browse/HBASE-28170
> Project: HBase
> Issue Type: Sub-task
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Major
>
> In HBASE-28004, we added a "cached time" long at the end of each block on the
> bucket cache. We also record the cached time in the backing map we persist to
> disk periodically, in order to retrieve the cache upon crashes/restarts. The
> persisted backing map includes the last modification time of the cache itself.
> On restarts, once we read the backing map from the persisted file, we compare
> the last modification time of the cache recorded there against the last
> modification time of the cache. If those differ, it means the cache has been
> updated after the backing map has been persisted, so the backing map might
> not be accurate. We then iterate though the backing map entires and compare
> the entries cached time against the related block in the cache, and if those
> differ, we remove the entry from the map.
> Currently this validation is made at RS initialisation time, but with caches
> as large as 1.6TB/30M+ blocks, it can last to an hour, meaning the RS is
> useless over that time. This PR changes this validation to be performed in
> the background, whilst direct accesses to a block in the cache would also
> perform the "cached time" comparison.
> This PR also moves the "cached time" to the beginning of the block in the
> cache, instead of the end. We noticed that with the "cached time" at the end
> we can fail to ensure consistency at some conditions. Consider the following:
> 1) A block B1 of size S gets allocated at offset 0 with cached time T1;
> 2) The backing map is persisted, containing B1 at offset 0 and cached time T1;
> 3) B1 is evicted. It's offset in the cache is now free, however its contents
> are still there, including the cached time T1 at its end;
> 4) A new block B2 of size S/2 gets allocated at offset 0 with cached time T2;
> 5) RS crashes before the backing map gets saved, so the persisted backing map
> still has only the reference to B1, but not B2;
> 6) At restart, we run the validation. Because B2 was half the size of B1, we
> haven't overridden B1 cached time from the cache, so we will successfully
> validate B1, although its content is now half overridden by B2.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)