Vihang Karajgaonkar created IMPALA-7954:
-------------------------------------------

             Summary: Support automatic invalidates using metastore 
notification events
                 Key: IMPALA-7954
                 URL: https://issues.apache.org/jira/browse/IMPALA-7954
             Project: IMPALA
          Issue Type: Improvement
          Components: Catalog
            Reporter: Vihang Karajgaonkar


Currently, in Impala there are multiple ways to invalidate or refresh the 
metadata stored in Catalog for Tables. Objects in Catalog can be invalidated 
either on usage based approach (invalidate_tables_timeout_s) or when there is 
GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448. 
However, most users issue invalidate commands when they want to sync to the 
latest information from HDFS or HMS. Unfortunately, when data is modified or 
new data is added outside Impala (eg. Hive) or a different Impala cluster, 
users don't have a clear idea on whether they have to issue invalidate or not. 
To be on the safer side, users keep issuing invalidate commands more than 
necessary and it causes performance as well as stability issues.

Hive Metastore provides a simple API to get incremental updates to the metadata 
information stored in its database. Each API which does a add/alter/drop 
operation in metastore generates event(s) which can be fetched using 
{{get_next_notification}} API. Each event has a unique and increasing event_id. 
The current notification event id can be fetched using 
{{get_current_notificationEventId}} API.

This JIRA proposes to make use of such events from metastore to proactively 
either invalidate or refresh information in the catalogD. When configured, 
CatalogD could poll for such events and take action (like add/drop/refresh 
partition, add/drop/invalidate tables and databases) based on the events. This 
way we can automatically refresh the catalogD state using events and it would 
greatly help the use-cases where users want to see the latest information 
(within a configurable interval of time delay) without flooding the system with 
invalidate requests.

I will be attaching a design doc to this JIRA and create subtasks for the work. 
Feel free to make comments on the JIRA or make suggestions to improve the 
design.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to