[
https://issues.apache.org/jira/browse/IMPALA-11904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on IMPALA-11904 started by Ye Zihao.
-----------------------------------------
> Data cache should support dumping metadata for reloading
> --------------------------------------------------------
>
> Key: IMPALA-11904
> URL: https://issues.apache.org/jira/browse/IMPALA-11904
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 4.3.0
> Reporter: Ye Zihao
> Assignee: Ye Zihao
> Priority: Major
>
> Data cache mainly includes cache metadata and cache files. The cache files
> are located on the disk and is responsible for storing cached data content,
> while the cache metadata is located in the memory and is responsible for
> indexing to the cache file according to the cache key.
> Currently, if the impalad process exits, the cache metadata will be lost.
> After the Impalad process restarts, we cannot reuse the cache file even
> though it is still on the disk, because there is no corresponding cache
> metadata for index.
> If we can support dumping the cache metadata to disk when the process exits,
> then the next time the process starts it can be reloaded back into memory and
> the previous cache files can be reused. This would be helpful in a real
> production environment, where cache data often exceeds TB in size (per
> process), and loss of cache data due to a configuration change or version
> upgrade can take days to recover.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]