[
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Venugopal Reddy K updated IMPALA-12709:
---------------------------------------
Description:
*Current Issue:*
At present, metastore event processor is single threaded. Notification events
are processed sequentially with a maximum limit of 1000 events fetched and
processed in a single batch. Multiple locks are used to address the concurrency
issues that may arise when catalog DDL operation processing and metastore event
processing tries to access/update the catalog objects concurrently. Waiting for
a lock or file metadata loading of a table can slow the event processing and
can affect the processing of other events following it. Those events may not be
dependent on the previous event. Altogether it takes a very long time to
synchronize all the HMS events.
*Proposal:*
Existing metastore event processing can be turned into multi-level event
processing. Idea is to segregate the events based on their dependency, maintain
the order of events as they occur within the dependency and process them
independently as much as possible:
# All the events of a table are processed in the same order they have actually
occurred.
# Events of different tables are processed in parallel.
# When a database is altered, all the events relating to the database(i.e.,
for all its tables) occurring after the alter db event are processed only after
the alter database event is processed ensuring the order.
Have attached an initial proposal design document
https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b
was:
*Current Issue:*
At present, metastore event processor is single threaded. Notification events
are processed sequentially with a maximum limit of 1000 events fetched and
processed in a single batch. Multiple locks are used to address the concurrency
issues that may arise when catalog DDL operation processing and metastore event
processing tries to access/update the catalog objects concurrently. Waiting for
a lock or file metadata loading of a table can slow the event processing and
can affect the processing of other events following it. Those events may not be
dependent on the previous event. Altogether it takes a very long time to
synchronize all the HMS events.
*Proposal:*
Existing metastore event processing can be turned into multi-level event
processing. Idea is to segregate the events based on their dependency, maintain
the order of events as they occur within the dependency and process them
independently as much as possible:
# All the events of a table are processed in the same order they have actually
occurred.
# Events of different tables are processed in parallel.
# When a database is altered, all the events relating to the database(i.e.,
for all its tables) occurring after the alter db event are processed only after
the alter database event is processed ensuring the order.A
> Hierarchical metastore event processing
> ---------------------------------------
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Reporter: Venugopal Reddy K
> Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events
> are processed sequentially with a maximum limit of 1000 events fetched and
> processed in a single batch. Multiple locks are used to address the
> concurrency issues that may arise when catalog DDL operation processing and
> metastore event processing tries to access/update the catalog objects
> concurrently. Waiting for a lock or file metadata loading of a table can slow
> the event processing and can affect the processing of other events following
> it. Those events may not be dependent on the previous event. Altogether it
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event
> processing. Idea is to segregate the events based on their dependency,
> maintain the order of events as they occur within the dependency and process
> them independently as much as possible:
> # All the events of a table are processed in the same order they have
> actually occurred.
> # Events of different tables are processed in parallel.
> # When a database is altered, all the events relating to the database(i.e.,
> for all its tables) occurring after the alter db event are processed only
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]