Quanlong Huang created IMPALA-12933:
---------------------------------------
Summary: Catalogd should set eventTypeSkipList when fetching
specifit events for a table
Key: IMPALA-12933
URL: https://issues.apache.org/jira/browse/IMPALA-12933
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang
There are several places that catalogd will fetch all events of a specifit type
on a table. E.g. in TableLoader#load(), if the table has an old createEventId,
catalogd will fetch all CREATE_TABLE events after that createEventId on the
table.
Fetching the list of events is expensive since the filtering is done on client
side, i.e. catalogd fetch all events and filter them locally based on the event
type and table name:
[https://github.com/apache/impala/blob/148888e3ed4f97292499b2e6ee8d5a756dc648d9/fe/src/main/java/org/apache/impala/catalog/TableLoader.java#L98-L102]
[https://github.com/apache/impala/blob/b7ddbcad0dd6accb559a3f391a897a8c442d1728/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L336]
This could take hours if there are lots of events (e.g 1M) in HMS. In fact,
NotificationEventRequest can specify an eventTypeSkipList. Catalogd can do the
filtering of event type in HMS side. On higher Hive versions that have
HIVE-27499, catalogd can also specify the table name in the request
(IMPALA-12607).
This Jira focus on specifying the eventTypeSkipList when fetching events of a
particular type on a table.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)