[
https://issues.apache.org/jira/browse/IMPALA-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061901#comment-18061901
]
ASF subversion and git services commented on IMPALA-14739:
----------------------------------------------------------
Commit a51b8a2fb17def86a9c2e09869d20fbcd941e260 in impala's branch
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a51b8a2fb ]
IMPALA-14739: Harden CatalogdMetaProvider.Weigher for edge cases
Handle two cases safer:
1. > 2GB cache entries
These were truncated before the patch to 2GB to fit to int32, and this
underreporting meant that the cache could grow beyond its supposed
limit. This is improved by using byte_size / 16 as weight, allowing
sizes up to 32GB, which seems unrealistically large to me.
2. eviction of currently loaded entries
The "piggy-backing" mechanism uses CompletableFuture as values while
loading objects. Evicting this before loading finishes leads to not
writing back the loaded object to cache as it is assumed that it was
invalidated. The patch protects against weight based eviction by
weighing these entries as 0, which leads the weight based eviction to
ignore them. This is the recommended way to "lock" entries in weight
bounded guava / caffeine caches.
Time based evection can still remove entries while loading
(1 hour by default). Both time and weight based eviction should be
rare - one needs >1hour loading time, the other needs many new entries
added while loading to push out the entry from LRU cache.
Change-Id: Id525b9b0578fb7f9cb3e0f8f4fa32f6fdae313b9
Reviewed-on: http://gerrit.cloudera.org:8080/24037
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Local catalog cache mem usage is hard do diagnise
> -------------------------------------------------
>
> Key: IMPALA-14739
> URL: https://issues.apache.org/jira/browse/IMPALA-14739
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog, Frontend
> Reporter: Csaba Ringhofer
> Priority: Critical
>
> Huge objects like Iceberg tables with many files can dominate cache memory
> usage, but it is hard to decipher this information from metrics. There is
> also no trace of the very problematic case of truncating weight of >2GB
> objects:
> https://github.com/apache/impala/blob/3be15fd3598071eaeddd9b4d29e0883b95fdd14a/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L2353
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]