[ 
https://issues.apache.org/jira/browse/IMPALA-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812087#comment-17812087
 ] 

ASF subversion and git services commented on IMPALA-12463:
----------------------------------------------------------

Commit eaa35b02507a834edd0d219343fd4bd075f21762 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=eaa35b025 ]

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
    in the affected database.
 2. Alter table rename cuts any batches on the source or destination
    table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Reviewed-on: http://gerrit.cloudera.org:8080/20533
Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Allow batching of non consecutive metastore events
> --------------------------------------------------
>
>                 Key: IMPALA-12463
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12463
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Csaba Ringhofer
>            Assignee: Joe McDonnell
>            Priority: Major
>         Attachments: concurrent_metadata_load.py
>
>
> Currently Impala tries to batch events like partition insert/creation only if:
> 1. the next event is for the same table as the previous one
> 2. the next event's id is the previous one's + 1
> 3. the next event has the same type as the previous one
> (2 can be stricter than 1 if some events were filtered between the two)
> See 
> https://github.com/apache/impala/blob/94f4f1d82461d8f71fbd0d2e9082aa29b5f53a89/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L315
> Another limit is that only events in the same batch from HMS can be merged. 
> Currently 1000 events are polled at the same time: 
> https://github.com/apache/impala/blob/94f4f1d82461d8f71fbd0d2e9082aa29b5f53a89/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L218
> Making this configurable could be also useful.
> Event batching could be improved by batching all events to the current one if 
> they modify the same table, unless they are "cut" by:
> a. an event on the same table but with a different type
> b. a rename table event where the original or the new name is the same as the 
> current event
> If such an event occurs, the events after that can be only merged to a newer 
> event.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to