Csaba Ringhofer created IMPALA-12463:
----------------------------------------
Summary: Allow batching of non consecutive metastore events
Key: IMPALA-12463
URL: https://issues.apache.org/jira/browse/IMPALA-12463
Project: IMPALA
Issue Type: Improvement
Components: Catalog
Reporter: Csaba Ringhofer
Currently Impala tries to batch events like partition insert/creation only if:
1. the next event is for the same table as the previous one
2. the next event's id is the previous one's + 1
3. the next event has the same type as the previous one
(2 can be stricter than 1 if some events were filtered between the two)
See
https://github.com/apache/impala/blob/94f4f1d82461d8f71fbd0d2e9082aa29b5f53a89/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L315
Another limit is that only events in the same batch from HMS can be merged.
Currently 1000 events are polled at the same time:
https://github.com/apache/impala/blob/94f4f1d82461d8f71fbd0d2e9082aa29b5f53a89/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L218
Making this configurable could be also useful.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)