[
https://issues.apache.org/jira/browse/IMPALA-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615027#comment-16615027
]
ASF subversion and git services commented on IMPALA-7448:
---------------------------------------------------------
Commit 49095c7e8b8ba1f8e69a68f15a322cc1ead13b7e in impala's branch
refs/heads/master from [~tianyiwang]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=49095c7 ]
IMPALA-7448: Invalidate recently unused tables from catalogd
This patch implements an automatic invalidation mechanism in catalogd.
There are two invalidation strategies:
1. Periodically the HDFS tables that are not used in a configured
period "invalidate_tables_timeout_s" is invalidated from catalogd.
2. If the old GC generation is almost full, a certain percentage of LRU
tables are invalidated. This can be enabled by backend flag
"invalidate_tables_on_memory_pressure".
The table usage is reported by impalad to catalogd when the tables are
used during planning.
Tests on time-based invalidation are added. It is manually verified that
the GC callback is called if strings are randomly stuffed into catalogd.
Change-Id: Ib549717abefcffb14d9a3814ee8cf0de8bd49e89
Reviewed-on: http://gerrit.cloudera.org:8080/11224
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Tianyi Wang <[email protected]>
> Periodically evict recently unused table from catalogd
> ------------------------------------------------------
>
> Key: IMPALA-7448
> URL: https://issues.apache.org/jira/browse/IMPALA-7448
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Affects Versions: Impala 3.1.0
> Reporter: Tianyi Wang
> Assignee: Tianyi Wang
> Priority: Major
>
> To limit the memory consumption of catalog, we should experiment with a
> mechanism automatically evicting recently unused tables from catalogd.
> Initial design:
> - impalad to report periodically/asynchronously the set of catalog objects
> that were accessed
> - catalogd to record some kind of last access time
> - catalogd to have some facility to scan over all catalog objects, collect
> some number of not-recently-used ones (eg to reach a target amount of evicted
> memory), and issue invalidate commands to itself
> - no need to have exact LRU behavior -- to simplify, we probably shouldn't
> try to do a classical LRU linked list between all catalog objects.
> - initial patch probably just triggered manually. Discussed either running
> this on a schedule or running this based on JMX GC notifications if we see
> that the catalogd finished an old-gen GC and the old gen is more than some
> target percentage full.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]