Ye Zihao created IMPALA-11904:
---------------------------------
Summary: Data cache should support dumping metadata for reloading
Key: IMPALA-11904
URL: https://issues.apache.org/jira/browse/IMPALA-11904
Project: IMPALA
Issue Type: Improvement
Components: Backend
Affects Versions: Impala 4.3.0
Reporter: Ye Zihao
Assignee: Ye Zihao
Data cache mainly includes cache metadata and cache files. The cache files are
located on the disk and is responsible for storing cached data content, while
the cache metadata is located in the memory and is responsible for indexing to
the cache file according to the cache key.
Currently, if the impalad process exits, the cache metadata will be lost.
After the Impalad process restarts, we cannot reuse the cache file even though
it is still on the disk, because there is no corresponding cache metadata for
index.
If we can support dumping the cache metadata to disk when the process exits,
then the next time the process starts it can be reloaded back into memory and
the previous cache files can be reused. This would be helpful in a real
production environment, where cache data often exceeds TB in size (per
process), and loss of cache data due to a configuration change or version
upgrade can take days to recover.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]