[ https://issues.apache.org/jira/browse/IMPALA-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812087#comment-17812087 ]
ASF subversion and git services commented on IMPALA-12463: ---------------------------------------------------------- Commit eaa35b02507a834edd0d219343fd4bd075f21762 in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=eaa35b025 ] IMPALA-12463: Batch non-consecutive table events in the event processor The current batching of events requires events to be consecutive. When there are multiple tables being modified, events can be interleaved such that each batch is very small. If the batching criteria can be relaxed, the non-consecutive events could be batched and processed more efficiently. This implements batching for non-consecutive events by keeping state on each table individually. Different tables can continue to accumulate batchable events independently unless they hit a condition that cuts the batch. The batching can ignore some events on unrelated tables, but the same rules apply about the batching of events on an individual table. For example, for a particular table, any non-INSERT event between two INSERT events on that table continues to cut the batching. In addition, there are certain cross-table events that need to cut batches across multiple tables: 1. Drop database / alter database cuts any batches on tables in the affected database. 2. Alter table rename cuts any batches on the source or destination table. This emits events in monotonically increasing order by maintaining the resulting events in a sorted map. All non-batchable events will be emitted in the original order. Batchable events are emitted based on the ending Event ID of the batch. This means that batchable events can move later in the sequence, but they cannot move earlier. This is based on the original design by Wenzhe Zhou. Testing: - MetastoreEventsProcessorTest has new tests for interleaved events on two tables as well as tests for events that cut batches across tables (alter table, drop database, alter database). - A core job showed no other test failures. Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Reviewed-on: http://gerrit.cloudera.org:8080/20533 Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Allow batching of non consecutive metastore events > -------------------------------------------------- > > Key: IMPALA-12463 > URL: https://issues.apache.org/jira/browse/IMPALA-12463 > Project: IMPALA > Issue Type: Improvement > Components: Catalog > Reporter: Csaba Ringhofer > Assignee: Joe McDonnell > Priority: Major > Attachments: concurrent_metadata_load.py > > > Currently Impala tries to batch events like partition insert/creation only if: > 1. the next event is for the same table as the previous one > 2. the next event's id is the previous one's + 1 > 3. the next event has the same type as the previous one > (2 can be stricter than 1 if some events were filtered between the two) > See > https://github.com/apache/impala/blob/94f4f1d82461d8f71fbd0d2e9082aa29b5f53a89/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L315 > Another limit is that only events in the same batch from HMS can be merged. > Currently 1000 events are polled at the same time: > https://github.com/apache/impala/blob/94f4f1d82461d8f71fbd0d2e9082aa29b5f53a89/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L218 > Making this configurable could be also useful. > Event batching could be improved by batching all events to the current one if > they modify the same table, unless they are "cut" by: > a. an event on the same table but with a different type > b. a rename table event where the original or the new name is the same as the > current event > If such an event occurs, the events after that can be only merged to a newer > event. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org