[ 
https://issues.apache.org/jira/browse/IMPALA-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615027#comment-16615027
 ] 

ASF subversion and git services commented on IMPALA-7448:
---------------------------------------------------------

Commit 49095c7e8b8ba1f8e69a68f15a322cc1ead13b7e in impala's branch 
refs/heads/master from [~tianyiwang]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=49095c7 ]

IMPALA-7448: Invalidate recently unused tables from catalogd

This patch implements an automatic invalidation mechanism in catalogd.
There are two invalidation strategies:
1. Periodically the HDFS tables that are not used in a configured
   period "invalidate_tables_timeout_s" is invalidated from catalogd.
2. If the old GC generation is almost full, a certain percentage of LRU
   tables are invalidated. This can be enabled by backend flag
   "invalidate_tables_on_memory_pressure".

The table usage is reported by impalad to catalogd when the tables are
used during planning.
Tests on time-based invalidation are added. It is manually verified that
the GC callback is called if strings are randomly stuffed into catalogd.

Change-Id: Ib549717abefcffb14d9a3814ee8cf0de8bd49e89
Reviewed-on: http://gerrit.cloudera.org:8080/11224
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Tianyi Wang <[email protected]>


> Periodically evict recently unused table from catalogd
> ------------------------------------------------------
>
>                 Key: IMPALA-7448
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7448
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 3.1.0
>            Reporter: Tianyi Wang
>            Assignee: Tianyi Wang
>            Priority: Major
>
> To limit the memory consumption of catalog, we should experiment with a 
> mechanism automatically evicting recently unused tables from catalogd. 
> Initial design:
> - impalad to report periodically/asynchronously the set of catalog objects 
> that were accessed
> - catalogd to record some kind of last access time
> - catalogd to have some facility to scan over all catalog objects, collect 
> some number of not-recently-used ones (eg to reach a target amount of evicted 
> memory), and issue invalidate commands to itself
> - no need to have exact LRU behavior -- to simplify, we probably shouldn't 
> try to do a classical LRU linked list between all catalog objects.
> - initial patch probably just triggered manually. Discussed either running 
> this on a schedule or running this based on JMX GC notifications if we see 
> that the catalogd finished an old-gen GC and the old gen is more than some 
> target percentage full.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to