[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833161#comment-17833161
 ] 

Venugopal Reddy K edited comment on IMPALA-12709 at 4/2/24 12:18 PM:
---------------------------------------------------------------------

[~maxwellguo] Currently measuring and comparing the time taken with base and 
modified versions. Also tuning the configuration paramters added with the 
gerrit to see the change in the event processing time. Since there are no 
existing tests to measure the event processing time, I am adding some tests. 
[https://gerrit.cloudera.org/#/c/21031/8/fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorPerfTest.java]

has a test to create 10 databases and 10 tables(non-transactional) on each db, 
inserted data into all these 100 tables and dropped tables and databases. All 
of them from hive. Just event processing on impala. Results showed that with 
Hierarchical Processing enabled, insert into table(generates ALTER and INSERT 
events) looks to be much faster.(nearly 5times). But create databases, tables 
and drop tables and databases event processing is not. I am checking it.  Also 
planning to add more perf tests with partitioned tables, transactional tables, 
different possible sequence of events,  generating events from impala side etc

Test output for 2 runs:
{noformat}
I0402 17:04:40.532240 712808 EventsProcessorPerfTest.java:131] [Performance] 
With Hierarchical Processing: false
I0402 17:04:40.705119 712808 EventsProcessorPerfTest.java:140] [Performance] 
Time taken to process create database events: 75.11 ms
I0402 17:04:43.643066 712808 EventsProcessorPerfTest.java:153] [Performance] 
Time taken to process create table events: 136.4 ms
I0402 17:04:44.130368 712808 EventsProcessorPerfTest.java:181] [Performance] 
Time taken to load table: 486.9 ms
I0402 17:05:15.474153 712808 EventsProcessorPerfTest.java:194] [Performance] 
Time taken to process insert events : 1.955 s
I0402 17:05:24.419824 712808 EventsProcessorPerfTest.java:206] [Performance] 
Time taken to process drop table events : 97.01 ms
I0402 17:05:24.684505 712808 EventsProcessorPerfTest.java:216] [Performance] 
Time taken to process database events : 26.55 ms
I0402 17:05:25.107113 712808 EventsProcessorPerfTest.java:131] [Performance] 
With Hierarchical Processing: true
I0402 17:05:25.196743 712808 EventsProcessorPerfTest.java:140] [Performance] 
Time taken to process create database events: 15.21 ms
I0402 17:05:28.118330 712808 EventsProcessorPerfTest.java:153] [Performance] 
Time taken to process create table events: 50.12 ms
I0402 17:05:28.473388 712808 EventsProcessorPerfTest.java:181] [Performance] 
Time taken to load table: 354.8 ms
I0402 17:05:52.529421 712808 EventsProcessorPerfTest.java:194] [Performance] 
Time taken to process insert events : 402.1 ms
I0402 17:06:01.460664 712808 EventsProcessorPerfTest.java:206] [Performance] 
Time taken to process drop table events : 132.2 ms
I0402 17:06:01.848369 712808 EventsProcessorPerfTest.java:216] [Performance] 
Time taken to process database events : 27.53 ms
I0402 17:06:02.227852 712808 EventsProcessorPerfTest.java:131] [Performance] 
With Hierarchical Processing: false
I0402 17:06:02.435050 712808 EventsProcessorPerfTest.java:140] [Performance] 
Time taken to process create database events: 18.10 ms
I0402 17:06:05.132701 712808 EventsProcessorPerfTest.java:153] [Performance] 
Time taken to process create table events: 110.8 ms
I0402 17:06:05.726616 712808 EventsProcessorPerfTest.java:181] [Performance] 
Time taken to load table: 593.7 ms
I0402 17:06:30.767912 712808 EventsProcessorPerfTest.java:194] [Performance] 
Time taken to process insert events : 2.246 s
I0402 17:06:40.019438 712808 EventsProcessorPerfTest.java:206] [Performance] 
Time taken to process drop table events : 122.7 ms
I0402 17:06:40.383190 712808 EventsProcessorPerfTest.java:216] [Performance] 
Time taken to process database events : 22.18 ms
I0402 17:06:40.801436 712808 EventsProcessorPerfTest.java:131] [Performance] 
With Hierarchical Processing: true
I0402 17:06:41.036427 712808 EventsProcessorPerfTest.java:140] [Performance] 
Time taken to process create database events: 21.29 ms
I0402 17:06:43.558152 712808 EventsProcessorPerfTest.java:153] [Performance] 
Time taken to process create table events: 101.3 ms
I0402 17:06:43.942732 712808 EventsProcessorPerfTest.java:181] [Performance] 
Time taken to load table: 384.1 ms
I0402 17:07:08.202667 712808 EventsProcessorPerfTest.java:194] [Performance] 
Time taken to process insert events : 465.2 ms
I0402 17:07:17.037060 712808 EventsProcessorPerfTest.java:206] [Performance] 
Time taken to process drop table events : 137.3 ms
I0402 17:07:17.377442 712808 EventsProcessorPerfTest.java:216] [Performance] 
Time taken to process database events : 20.56 ms
{noformat}
 


was (Author: venureddy):
[~maxwellguo] Currently measuring and comparing the time taken with base and 
modified versions. Also tuning the configuration paramters added with the 
gerrit to see the change in the event processing time. Since there are no 
existing tests to measure the event processing time, I am adding some tests. 
[https://gerrit.cloudera.org/#/c/21031/8/fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorPerfTest.java]

has a test to create 10 databases and 10 tables(non-transactional) on each db, 
inserted data into all these 100 tables and dropped tables and databases. 
Results showed that with Hierarchical Processing enabled, insert into 
table(generates ALTER and INSERT events) looks to be much faster.(nearly 
5times). But create databases, tables and drop tables and databases it is not. 
I am checking it.  Also planning to add more perf tests with partitioned 
tables, transactional tables, different possible sequence of events etc

Test output for 2 runs:
{noformat}
I0402 17:04:40.532240 712808 EventsProcessorPerfTest.java:131] [Performance] 
With Hierarchical Processing: false
I0402 17:04:40.705119 712808 EventsProcessorPerfTest.java:140] [Performance] 
Time taken to process create database events: 75.11 ms
I0402 17:04:43.643066 712808 EventsProcessorPerfTest.java:153] [Performance] 
Time taken to process create table events: 136.4 ms
I0402 17:04:44.130368 712808 EventsProcessorPerfTest.java:181] [Performance] 
Time taken to load table: 486.9 ms
I0402 17:05:15.474153 712808 EventsProcessorPerfTest.java:194] [Performance] 
Time taken to process insert events : 1.955 s
I0402 17:05:24.419824 712808 EventsProcessorPerfTest.java:206] [Performance] 
Time taken to process drop table events : 97.01 ms
I0402 17:05:24.684505 712808 EventsProcessorPerfTest.java:216] [Performance] 
Time taken to process database events : 26.55 ms
I0402 17:05:25.107113 712808 EventsProcessorPerfTest.java:131] [Performance] 
With Hierarchical Processing: true
I0402 17:05:25.196743 712808 EventsProcessorPerfTest.java:140] [Performance] 
Time taken to process create database events: 15.21 ms
I0402 17:05:28.118330 712808 EventsProcessorPerfTest.java:153] [Performance] 
Time taken to process create table events: 50.12 ms
I0402 17:05:28.473388 712808 EventsProcessorPerfTest.java:181] [Performance] 
Time taken to load table: 354.8 ms
I0402 17:05:52.529421 712808 EventsProcessorPerfTest.java:194] [Performance] 
Time taken to process insert events : 402.1 ms
I0402 17:06:01.460664 712808 EventsProcessorPerfTest.java:206] [Performance] 
Time taken to process drop table events : 132.2 ms
I0402 17:06:01.848369 712808 EventsProcessorPerfTest.java:216] [Performance] 
Time taken to process database events : 27.53 ms
I0402 17:06:02.227852 712808 EventsProcessorPerfTest.java:131] [Performance] 
With Hierarchical Processing: false
I0402 17:06:02.435050 712808 EventsProcessorPerfTest.java:140] [Performance] 
Time taken to process create database events: 18.10 ms
I0402 17:06:05.132701 712808 EventsProcessorPerfTest.java:153] [Performance] 
Time taken to process create table events: 110.8 ms
I0402 17:06:05.726616 712808 EventsProcessorPerfTest.java:181] [Performance] 
Time taken to load table: 593.7 ms
I0402 17:06:30.767912 712808 EventsProcessorPerfTest.java:194] [Performance] 
Time taken to process insert events : 2.246 s
I0402 17:06:40.019438 712808 EventsProcessorPerfTest.java:206] [Performance] 
Time taken to process drop table events : 122.7 ms
I0402 17:06:40.383190 712808 EventsProcessorPerfTest.java:216] [Performance] 
Time taken to process database events : 22.18 ms
I0402 17:06:40.801436 712808 EventsProcessorPerfTest.java:131] [Performance] 
With Hierarchical Processing: true
I0402 17:06:41.036427 712808 EventsProcessorPerfTest.java:140] [Performance] 
Time taken to process create database events: 21.29 ms
I0402 17:06:43.558152 712808 EventsProcessorPerfTest.java:153] [Performance] 
Time taken to process create table events: 101.3 ms
I0402 17:06:43.942732 712808 EventsProcessorPerfTest.java:181] [Performance] 
Time taken to load table: 384.1 ms
I0402 17:07:08.202667 712808 EventsProcessorPerfTest.java:194] [Performance] 
Time taken to process insert events : 465.2 ms
I0402 17:07:17.037060 712808 EventsProcessorPerfTest.java:206] [Performance] 
Time taken to process drop table events : 137.3 ms
I0402 17:07:17.377442 712808 EventsProcessorPerfTest.java:216] [Performance] 
Time taken to process database events : 20.56 ms
{noformat}
 

> Hierarchical metastore event processing
> ---------------------------------------
>
>                 Key: IMPALA-12709
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12709
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Venugopal Reddy K
>            Assignee: Venugopal Reddy K
>            Priority: Major
>         Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to