[jira] [Commented] (HBASE-28004) Persistent cache map can get corrupt if crash happens midway through the write

Hudson (Jira) Wed, 23 Aug 2023 22:58:10 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-28004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758369#comment-17758369
 ]


Hudson commented on HBASE-28004:
--------------------------------

Results for branch branch-2
        [build #868 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/868/]: 
(x) *{color:red}-1 overall{color}*
----
details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/868/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/868/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/868/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/868/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Persistent cache map can get corrupt if crash happens midway through the write
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-28004
>                 URL: https://issues.apache.org/jira/browse/HBASE-28004
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.6.0, 3.0.0-alpha-4, 4.0.0-alpha-1
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>             Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1
>
>
> HBASE-27686 added a background thread for periodically saving the cache index 
> map, together with a list of completed cached files so that we can recover 
> the cache state in case of crash or restart. Problem is that the cache index 
> can become few GB large (a sample case with 1.6TB of used bucket cache would 
> map to between 8GB to 10GB indexes), and these writes take few seconds to 
> complete, causing any RS crash very likely to leave a corrupt index file that 
> can't be recovered when the RS starts again. Worse, since we store the list 
> of cached files on a separate file, this also leads to cache inconsistencies, 
> with files in the list of cached files never cached once the RS is restarted, 
> even though we have no cache index for those and every read ends up going to 
> the FS.
> This task aims to refactor the cache persistent as follows: 
> 1) Write both the list of completely cached files and the cache indexes in a 
> single file, so that we can have this synced atomically;
> 2) When writing the persistent cache file, use a temp name first, then once 
> the write is successfully finished, rename it to the actual name. This way, 
> if crash happens whilst the persistent cache is still being written, the temp 
> file would be corrupt, but we could still recover from the last successful 
> sync, and we would only lose the caching ops since the last sync.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-28004) Persistent cache map can get corrupt if crash happens midway through the write

Reply via email to