[ 
https://issues.apache.org/jira/browse/IMPALA-10976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated IMPALA-10976:
-------------------------------------------
    Description: 
This is a follow up from IMPALA-10926. The idea is that when any DDL operation 
is performed from Impala shell, it also syncs the db/table to its latest event 
ID as per HMS. This way updates to a db/table's are applied in the same order 
as they appear in the Notification log in HMS which ensures consistency. 
Currently catalogD applies any updates received from Impala shell in place. 
Instead it should perform an HMS operation first and then replay all the HMS 
events since the last synced event.

 However there are subtle differences in how Impala processes DDLs via shell vs 
how it processes HMS events These are:
 * When processing an alter table event, currently catalogD does a full table 
reload. This has a performance impact as table reload is time consuming. 
Whereas in place alter table DDL operation in catalogOpExecutor (via Impala 
shell) is faster since detects when to reload table schema or file metadata or 
both. Need some improvements in Alter table event processing logic to detect 
whether to reload the file metadata or not. --> This is addressed by 
IMPALA-11534
 * Similar improvement is required in processing alter partition event. As of 
now, when processing AlterPartition HMS event, catalogd always  reloads file 
metadata but when doing the same from shell, it reloads metadata only when it 
is required. 
 * Impala shell already caches hive fns in catalog db’s object.  But catalogD 
does *not* process CREATE/DROP Fns HMS event
 * When creating a db/table from Impala shell, if the operation fails because 
the db/table already exists, then there is no reliable way in catalogd to 
determine create event id for that db/table. The create event is required so 
that for any subsequent ddl operations, catalogd can process HMS events 
starting from createEvent Id. 

  was:
This is a follow up from IMPALA-10926. The idea is that when any DDL operation 
is performed from Impala shell, it also syncs the db/table to its latest event 
ID as per HMS. This way updates to a db/table's are applied in the same order 
as they appear in the Notification log in HMS which ensures consistency. 
Currently catalogD applies any updates received from Impala shell in place. 
Instead it should perform an HMS operation first and then replay all the HMS 
events since the last synced event.

 However there are subtle differences in how Impala processes DDLs via shell vs 
how it processes HMS events These are:
 * When processing an alter table event, currently catalogD does a full table 
reload. This has a performance impact as table reload is time consuming. 
Whereas in place alter table DDL operation in catalogOpExecutor (via Impala 
shell) is faster since detects when to reload table schema or file metadata or 
both. Need some improvements in Alter table event processing logic to detect 
whether to reload the file metadata or not.
 * Similar improvement is required in processing alter partition event. As of 
now, when processing AlterPartition HMS event, catalogd always  reloads file 
metadata but when doing the same from shell, it reloads metadata only when it 
is required. 
 * Impala shell already caches hive fns in catalog db’s object.  But catalogD 
does *not* process CREATE/DROP Fns HMS event
 * When creating a db/table from Impala shell, if the operation fails because 
the db/table already exists, then there is no reliable way in catalogd to 
determine create event id for that db/table. The create event is required so 
that for any subsequent ddl operations, catalogd can process HMS events 
starting from createEvent Id. 


> Sync db/table in catalogd to latest HMS event id for all DDLs from Impala 
> shell
> -------------------------------------------------------------------------------
>
>                 Key: IMPALA-10976
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10976
>             Project: IMPALA
>          Issue Type: Task
>          Components: Catalog, Frontend
>            Reporter: Sourabh Goyal
>            Assignee: Sai Hemanth Gantasala
>            Priority: Major
>
> This is a follow up from IMPALA-10926. The idea is that when any DDL 
> operation is performed from Impala shell, it also syncs the db/table to its 
> latest event ID as per HMS. This way updates to a db/table's are applied in 
> the same order as they appear in the Notification log in HMS which ensures 
> consistency. Currently catalogD applies any updates received from Impala 
> shell in place. Instead it should perform an HMS operation first and then 
> replay all the HMS events since the last synced event.
>  However there are subtle differences in how Impala processes DDLs via shell 
> vs how it processes HMS events These are:
>  * When processing an alter table event, currently catalogD does a full table 
> reload. This has a performance impact as table reload is time consuming. 
> Whereas in place alter table DDL operation in catalogOpExecutor (via Impala 
> shell) is faster since detects when to reload table schema or file metadata 
> or both. Need some improvements in Alter table event processing logic to 
> detect whether to reload the file metadata or not. --> This is addressed by 
> IMPALA-11534
>  * Similar improvement is required in processing alter partition event. As of 
> now, when processing AlterPartition HMS event, catalogd always  reloads file 
> metadata but when doing the same from shell, it reloads metadata only when it 
> is required. 
>  * Impala shell already caches hive fns in catalog db’s object.  But catalogD 
> does *not* process CREATE/DROP Fns HMS event
>  * When creating a db/table from Impala shell, if the operation fails because 
> the db/table already exists, then there is no reliable way in catalogd to 
> determine create event id for that db/table. The create event is required so 
> that for any subsequent ddl operations, catalogd can process HMS events 
> starting from createEvent Id. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to