Quanlong Huang created IMPALA-12933:
---------------------------------------

             Summary: Catalogd should set eventTypeSkipList when fetching 
specifit events for a table
                 Key: IMPALA-12933
                 URL: https://issues.apache.org/jira/browse/IMPALA-12933
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


There are several places that catalogd will fetch all events of a specifit type 
on a table. E.g. in TableLoader#load(), if the table has an old createEventId, 
catalogd will fetch all CREATE_TABLE events after that createEventId on the 
table.

Fetching the list of events is expensive since the filtering is done on client 
side, i.e. catalogd fetch all events and filter them locally based on the event 
type and table name:
[https://github.com/apache/impala/blob/148888e3ed4f97292499b2e6ee8d5a756dc648d9/fe/src/main/java/org/apache/impala/catalog/TableLoader.java#L98-L102]
[https://github.com/apache/impala/blob/b7ddbcad0dd6accb559a3f391a897a8c442d1728/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L336]

This could take hours if there are lots of events (e.g 1M) in HMS. In fact, 
NotificationEventRequest can specify an eventTypeSkipList. Catalogd can do the 
filtering of event type in HMS side. On higher Hive versions that have 
HIVE-27499, catalogd can also specify the table name in the request 
(IMPALA-12607).



This Jira focus on specifying the eventTypeSkipList when fetching events of a 
particular type on a table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to