[jira] [Updated] (HBASE-28004) Persistent cache map can get corrupt if crash happens midway through the write

Wellington Chevreuil (Jira) Tue, 01 Aug 2023 08:37:20 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-28004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wellington Chevreuil updated HBASE-28004:
-----------------------------------------
    Description: 
HBASE-27686 added a background thread for periodically saving the cache index 
map, together with a list of completed cached files so that we can recover the 
cache state in case of crash or restart. Problem is that the cache index can 
become few GB large (a sample case with 1.6TB of used bucket cache would map to 
between 8GB to 10GB indexes), and these writes take few seconds to complete, 
causing any RS crash very likely to leave a corrupt index file that can't be 
recovered when the RS starts again. Worse, since we store the list of cached 
files on a separate file, this also leads to cache inconsistencies, with files 
in the list of cached files never cached once the RS is restarted, even though 
we have no cache index for those and every read ends up going to the FS.

This task aims to refactor the cache persistent as follows: 
1) Write both the list of completely cached files and the cache indexes in a 
single file, so that we can have this synced atomically;
2) When writing the persistent cache file, use a temp name first, then once the 
write is successfully finished, rename it to the actual name. This way, if 
crash happens whilst the persistent cache is still being written, the temp file 
would be corrupt, but we could still recover from the last successful sync, and 
we would only lose the caching ops since the last sync.

> Persistent cache map can get corrupt if crash happens midway through the write
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-28004
>                 URL: https://issues.apache.org/jira/browse/HBASE-28004
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>
> HBASE-27686 added a background thread for periodically saving the cache index 
> map, together with a list of completed cached files so that we can recover 
> the cache state in case of crash or restart. Problem is that the cache index 
> can become few GB large (a sample case with 1.6TB of used bucket cache would 
> map to between 8GB to 10GB indexes), and these writes take few seconds to 
> complete, causing any RS crash very likely to leave a corrupt index file that 
> can't be recovered when the RS starts again. Worse, since we store the list 
> of cached files on a separate file, this also leads to cache inconsistencies, 
> with files in the list of cached files never cached once the RS is restarted, 
> even though we have no cache index for those and every read ends up going to 
> the FS.
> This task aims to refactor the cache persistent as follows: 
> 1) Write both the list of completely cached files and the cache indexes in a 
> single file, so that we can have this synced atomically;
> 2) When writing the persistent cache file, use a temp name first, then once 
> the write is successfully finished, rename it to the actual name. This way, 
> if crash happens whilst the persistent cache is still being written, the temp 
> file would be corrupt, but we could still recover from the last successful 
> sync, and we would only lose the caching ops since the last sync.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HBASE-28004) Persistent cache map can get corrupt if crash happens midway through the write

Reply via email to