[
https://issues.apache.org/jira/browse/HBASE-28170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779219#comment-17779219
]
Hudson commented on HBASE-28170:
--------------------------------
Results for branch master
[build #930 on
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/930/]:
(x) *{color:red}-1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/930/General_20Nightly_20Build_20Report/]
(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3)
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/930/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/930/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test{color}
> Put the cached time at the beginning of the block; run cache validation in
> the background when retrieving the persistent cache
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-28170
> URL: https://issues.apache.org/jira/browse/HBASE-28170
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.6.0, 3.0.0, 4.0.0-alpha-1
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Major
> Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1
>
>
> In HBASE-28004, we added a "cached time" long at the end of each block on the
> bucket cache. We also record the cached time in the backing map we persist to
> disk periodically, in order to retrieve the cache upon crashes/restarts. The
> persisted backing map includes the last modification time of the cache itself.
> On restarts, once we read the backing map from the persisted file, we compare
> the last modification time of the cache recorded there against the last
> modification time of the cache. If those differ, it means the cache has been
> updated after the backing map has been persisted, so the backing map might
> not be accurate. We then iterate though the backing map entires and compare
> the entries cached time against the related block in the cache, and if those
> differ, we remove the entry from the map.
> Currently this validation is made at RS initialisation time, but with caches
> as large as 1.6TB/30M+ blocks, it can last to an hour, meaning the RS is
> useless over that time. This PR changes this validation to be performed in
> the background, whilst direct accesses to a block in the cache would also
> perform the "cached time" comparison.
> This PR also moves the "cached time" to the beginning of the block in the
> cache, instead of the end. We noticed that with the "cached time" at the end
> we can fail to ensure consistency at some conditions. Consider the following:
> 1) A block B1 of size S gets allocated at offset 0 with cached time T1;
> 2) The backing map is persisted, containing B1 at offset 0 and cached time T1;
> 3) B1 is evicted. It's offset in the cache is now free, however its contents
> are still there, including the cached time T1 at its end;
> 4) A new block B2 of size S/2 gets allocated at offset 0 with cached time T2;
> 5) RS crashes before the backing map gets saved, so the persisted backing map
> still has only the reference to B1, but not B2;
> 6) At restart, we run the validation. Because B2 was half the size of B1, we
> haven't overridden B1 cached time from the cache, so we will successfully
> validate B1, although its content is now half overridden by B2.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)