[
https://issues.apache.org/jira/browse/IMPALA-10976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sai Hemanth Gantasala updated IMPALA-10976:
-------------------------------------------
Description:
This is a follow up from IMPALA-10926. The idea is that when any DDL operation
is performed from Impala shell, it also syncs the db/table to its latest event
ID as per HMS. This way updates to a db/table's are applied in the same order
as they appear in the Notification log in HMS which ensures consistency.
Currently catalogD applies any updates received from Impala shell in place.
Instead it should perform an HMS operation first and then replay all the HMS
events since the last synced event.
However there are subtle differences in how Impala processes DDLs via shell vs
how it processes HMS events These are:
* When processing an alter table event, currently catalogD does a full table
reload. This has a performance impact as table reload is time consuming.
Whereas in place alter table DDL operation in catalogOpExecutor (via Impala
shell) is faster since detects when to reload table schema or file metadata or
both. Need some improvements in Alter table event processing logic to detect
whether to reload the file metadata or not. --> This is addressed by
IMPALA-11534
* Similar improvement is required in processing alter partition event. As of
now, when processing AlterPartition HMS event, catalogd always reloads file
metadata but when doing the same from shell, it reloads metadata only when it
is required.
* Impala shell already caches hive fns in catalog db’s object. But catalogD
does *not* process CREATE/DROP Fns HMS event
* When creating a db/table from Impala shell, if the operation fails because
the db/table already exists, then there is no reliable way in catalogd to
determine create event id for that db/table. The create event is required so
that for any subsequent ddl operations, catalogd can process HMS events
starting from createEvent Id.
was:
This is a follow up from IMPALA-10926. The idea is that when any DDL operation
is performed from Impala shell, it also syncs the db/table to its latest event
ID as per HMS. This way updates to a db/table's are applied in the same order
as they appear in the Notification log in HMS which ensures consistency.
Currently catalogD applies any updates received from Impala shell in place.
Instead it should perform an HMS operation first and then replay all the HMS
events since the last synced event.
However there are subtle differences in how Impala processes DDLs via shell vs
how it processes HMS events These are:
* When processing an alter table event, currently catalogD does a full table
reload. This has a performance impact as table reload is time consuming.
Whereas in place alter table DDL operation in catalogOpExecutor (via Impala
shell) is faster since detects when to reload table schema or file metadata or
both. Need some improvements in Alter table event processing logic to detect
whether to reload the file metadata or not.
* Similar improvement is required in processing alter partition event. As of
now, when processing AlterPartition HMS event, catalogd always reloads file
metadata but when doing the same from shell, it reloads metadata only when it
is required.
* Impala shell already caches hive fns in catalog db’s object. But catalogD
does *not* process CREATE/DROP Fns HMS event
* When creating a db/table from Impala shell, if the operation fails because
the db/table already exists, then there is no reliable way in catalogd to
determine create event id for that db/table. The create event is required so
that for any subsequent ddl operations, catalogd can process HMS events
starting from createEvent Id.
> Sync db/table in catalogd to latest HMS event id for all DDLs from Impala
> shell
> -------------------------------------------------------------------------------
>
> Key: IMPALA-10976
> URL: https://issues.apache.org/jira/browse/IMPALA-10976
> Project: IMPALA
> Issue Type: Task
> Components: Catalog, Frontend
> Reporter: Sourabh Goyal
> Assignee: Sai Hemanth Gantasala
> Priority: Major
>
> This is a follow up from IMPALA-10926. The idea is that when any DDL
> operation is performed from Impala shell, it also syncs the db/table to its
> latest event ID as per HMS. This way updates to a db/table's are applied in
> the same order as they appear in the Notification log in HMS which ensures
> consistency. Currently catalogD applies any updates received from Impala
> shell in place. Instead it should perform an HMS operation first and then
> replay all the HMS events since the last synced event.
> However there are subtle differences in how Impala processes DDLs via shell
> vs how it processes HMS events These are:
> * When processing an alter table event, currently catalogD does a full table
> reload. This has a performance impact as table reload is time consuming.
> Whereas in place alter table DDL operation in catalogOpExecutor (via Impala
> shell) is faster since detects when to reload table schema or file metadata
> or both. Need some improvements in Alter table event processing logic to
> detect whether to reload the file metadata or not. --> This is addressed by
> IMPALA-11534
> * Similar improvement is required in processing alter partition event. As of
> now, when processing AlterPartition HMS event, catalogd always reloads file
> metadata but when doing the same from shell, it reloads metadata only when it
> is required.
> * Impala shell already caches hive fns in catalog db’s object. But catalogD
> does *not* process CREATE/DROP Fns HMS event
> * When creating a db/table from Impala shell, if the operation fails because
> the db/table already exists, then there is no reliable way in catalogd to
> determine create event id for that db/table. The create event is required so
> that for any subsequent ddl operations, catalogd can process HMS events
> starting from createEvent Id.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]