[
https://issues.apache.org/jira/browse/HBASE-28004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wellington Chevreuil updated HBASE-28004:
-----------------------------------------
Description:
HBASE-27686 added a background thread for periodically saving the cache index
map, together with a list of completed cached files so that we can recover the
cache state in case of crash or restart. Problem is that the cache index can
become few GB large (a sample case with 1.6TB of used bucket cache would map to
between 8GB to 10GB indexes), and these writes take few seconds to complete,
causing any RS crash very likely to leave a corrupt index file that can't be
recovered when the RS starts again. Worse, since we store the list of cached
files on a separate file, this also leads to cache inconsistencies, with files
in the list of cached files never cached once the RS is restarted, even though
we have no cache index for those and every read ends up going to the FS.
This task aims to refactor the cache persistent as follows:
1) Write both the list of completely cached files and the cache indexes in a
single file, so that we can have this synced atomically;
2) When writing the persistent cache file, use a temp name first, then once the
write is successfully finished, rename it to the actual name. This way, if
crash happens whilst the persistent cache is still being written, the temp file
would be corrupt, but we could still recover from the last successful sync, and
we would only lose the caching ops since the last sync.
> Persistent cache map can get corrupt if crash happens midway through the write
> ------------------------------------------------------------------------------
>
> Key: HBASE-28004
> URL: https://issues.apache.org/jira/browse/HBASE-28004
> Project: HBase
> Issue Type: Sub-task
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Major
>
> HBASE-27686 added a background thread for periodically saving the cache index
> map, together with a list of completed cached files so that we can recover
> the cache state in case of crash or restart. Problem is that the cache index
> can become few GB large (a sample case with 1.6TB of used bucket cache would
> map to between 8GB to 10GB indexes), and these writes take few seconds to
> complete, causing any RS crash very likely to leave a corrupt index file that
> can't be recovered when the RS starts again. Worse, since we store the list
> of cached files on a separate file, this also leads to cache inconsistencies,
> with files in the list of cached files never cached once the RS is restarted,
> even though we have no cache index for those and every read ends up going to
> the FS.
> This task aims to refactor the cache persistent as follows:
> 1) Write both the list of completely cached files and the cache indexes in a
> single file, so that we can have this synced atomically;
> 2) When writing the persistent cache file, use a temp name first, then once
> the write is successfully finished, rename it to the actual name. This way,
> if crash happens whilst the persistent cache is still being written, the temp
> file would be corrupt, but we could still recover from the last successful
> sync, and we would only lose the caching ops since the last sync.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)