[ 
https://issues.apache.org/jira/browse/IMPALA-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned IMPALA-7954:
-------------------------------------------

    Assignee: Vihang Karajgaonkar

> Support automatic invalidates using metastore notification events
> -----------------------------------------------------------------
>
>                 Key: IMPALA-7954
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7954
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>
> Currently, in Impala there are multiple ways to invalidate or refresh the 
> metadata stored in Catalog for Tables. Objects in Catalog can be invalidated 
> either on usage based approach (invalidate_tables_timeout_s) or when there is 
> GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448. 
> However, most users issue invalidate commands when they want to sync to the 
> latest information from HDFS or HMS. Unfortunately, when data is modified or 
> new data is added outside Impala (eg. Hive) or a different Impala cluster, 
> users don't have a clear idea on whether they have to issue invalidate or 
> not. To be on the safer side, users keep issuing invalidate commands more 
> than necessary and it causes performance as well as stability issues.
> Hive Metastore provides a simple API to get incremental updates to the 
> metadata information stored in its database. Each API which does a 
> add/alter/drop operation in metastore generates event(s) which can be fetched 
> using {{get_next_notification}} API. Each event has a unique and increasing 
> event_id. The current notification event id can be fetched using 
> {{get_current_notificationEventId}} API.
> This JIRA proposes to make use of such events from metastore to proactively 
> either invalidate or refresh information in the catalogD. When configured, 
> CatalogD could poll for such events and take action (like add/drop/refresh 
> partition, add/drop/invalidate tables and databases) based on the events. 
> This way we can automatically refresh the catalogD state using events and it 
> would greatly help the use-cases where users want to see the latest 
> information (within a configurable interval of time delay) without flooding 
> the system with invalidate requests.
> I will be attaching a design doc to this JIRA and create subtasks for the 
> work. Feel free to make comments on the JIRA or make suggestions to improve 
> the design.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to