[
https://issues.apache.org/jira/browse/IMPALA-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
bharath v updated IMPALA-7954:
------------------------------
Affects Version/s: Impala 3.1.0
> Support automatic invalidates using metastore notification events
> -----------------------------------------------------------------
>
> Key: IMPALA-7954
> URL: https://issues.apache.org/jira/browse/IMPALA-7954
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Affects Versions: Impala 3.1.0
> Reporter: Vihang Karajgaonkar
> Assignee: Vihang Karajgaonkar
> Priority: Major
>
> Currently, in Impala there are multiple ways to invalidate or refresh the
> metadata stored in Catalog for Tables. Objects in Catalog can be invalidated
> either on usage based approach (invalidate_tables_timeout_s) or when there is
> GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448.
> However, most users issue invalidate commands when they want to sync to the
> latest information from HDFS or HMS. Unfortunately, when data is modified or
> new data is added outside Impala (eg. Hive) or a different Impala cluster,
> users don't have a clear idea on whether they have to issue invalidate or
> not. To be on the safer side, users keep issuing invalidate commands more
> than necessary and it causes performance as well as stability issues.
> Hive Metastore provides a simple API to get incremental updates to the
> metadata information stored in its database. Each API which does a
> add/alter/drop operation in metastore generates event(s) which can be fetched
> using {{get_next_notification}} API. Each event has a unique and increasing
> event_id. The current notification event id can be fetched using
> {{get_current_notificationEventId}} API.
> This JIRA proposes to make use of such events from metastore to proactively
> either invalidate or refresh information in the catalogD. When configured,
> CatalogD could poll for such events and take action (like add/drop/refresh
> partition, add/drop/invalidate tables and databases) based on the events.
> This way we can automatically refresh the catalogD state using events and it
> would greatly help the use-cases where users want to see the latest
> information (within a configurable interval of time delay) without flooding
> the system with invalidate requests.
> I will be attaching a design doc to this JIRA and create subtasks for the
> work. Feel free to make comments on the JIRA or make suggestions to improve
> the design.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]