[ 
https://issues.apache.org/jira/browse/IMPALA-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795150#comment-17795150
 ] 

Quanlong Huang commented on IMPALA-12356:
-----------------------------------------

Just revisited this. Here are my understanding of this bug. How it happens:
 # The INSERT statement in Impala creates a new partition. Impala adds the 
catalog version to the inFlightEvents list of the partition. 
 # When the corresponding ADD_PARTITION event comes, it's skipped since the 
partition already exists. See codes in 
CatalogOpExecutor#filterPartitionsToAddFromEvent():
[https://github.com/apache/impala/blob/d75807a195273cdba29e33f00b7b6f9bee012f62/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4526-L4529]
 # The inFlightEvents list of the partition keeps unchanged after the 
ADD_PARTITION event is skipped.
 # Next when the ALTER_PARTITION event comes, its hmsPartition object has the 
same parameter of "impala.events.catalogVersion". It's then treated as a 
self-event and skipped. The inFlightEvents list clears the version so follow-up 
events won't be impacted anymore.

Some comments on these steps.

#1 is the old mechanism of self-event detection before IMPALA-10502, i.e. using 
the catalog service id and catalog version to detect self events. It's buggy 
since the inFlightEvents list is part of the HdfsPartition object. When the 
partition is dropped, it's also dropped. If there is a DropPartition operation 
before catalogd processes the ADD_PARTITION event, it can't recognize the 
ADD_PARTITION event as a self-event since the inFlightEvents list is dropped. 
The ADD_PARTITION event will incorrectly add back the partition. To fix the 
issue, IMPALA-10502 uses the createEventId and EventDeleteLog to skip 
ADD_PARTITION and DROP_PARTITION events.

There are several solutions for the current issue:
 * Fix #3 that when the ADD_PARTITION event is skipped, the inFlightEvents list 
of that partition should also clear the corresponding item. This is 
[~VenuReddy] 's patch set 8: [https://gerrit.cloudera.org/c/20486/8]
 * Fix #2 to add back the old self-event detection mechanism for ADD_PARTITION 
events (on non-transactional tables). So the inFlightEvents list can also be 
cleared correctly. It's added back as the first guard. The new mechanism based 
on createEventId and EventDeleteLog still works if the first guard fail to 
detect the self event. This is [~VenuReddy] 's patch set 10: 
[https://gerrit.cloudera.org/c/20486/10]
 * Fix #1 to not add the catalog version into the inFlightEvents list since 
it's not needed after IMPALA-10502 (for create/drop self events). We just need 
to skip this line for non-transactional tables:
[https://github.com/apache/impala/blob/d75807a195273cdba29e33f00b7b6f9bee012f62/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L6904C18-L6904C18]
I.e. when INSERT creates new partitions, don't add the catalogServiceId and 
catalogVersion in the hms parameters. They are unneeded in the new mechanism of 
self-event detection on create/drop partition events. But they will disrupt the 
self-event detection of alter partition events which is still using the old 
mechanism.

I tend to the last solution since it's simpler and we are going to remove the 
old self-event detection mechanism in the future. What do you think? 
[~VenuReddy] , [~hemanth619] 

> Partition created by INSERT will make the next ALTER_PARTITION event on it 
> always treated as self-event
> -------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-12356
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12356
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Venugopal Reddy K
>            Priority: Critical
>              Labels: ramp-up
>
> In Impala, create a partitioned table and create one partition in it using 
> {*}INSERT{*}:
> {noformat}
> create table my_part (i int) partitioned by (p int) stored as parquet;
> insert into my_part partition(p=0) values (0),(1),(2);
> show partitions my_part
> +-------+-------+--------+------+--------------+-------------------+---------+-------------------+---------------------------------------------------+-----------+
> | p     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format  
> | Incremental stats | Location                                          | EC 
> Policy |
> +-------+-------+--------+------+--------------+-------------------+---------+-------------------+---------------------------------------------------+-----------+
> | 0     | -1    | 1      | 358B | NOT CACHED   | NOT CACHED        | PARQUET 
> | false             | hdfs://localhost:20500/test-warehouse/my_part/p=0 | 
> NONE      |
> | Total | -1    | 1      | 358B | 0B           |                   |         
> |                   |                                                   |     
>       |
> +-------+-------+--------+------+--------------+-------------------+---------+-------------------+---------------------------------------------------+-----------+
> {noformat}
> In Hive, describe the partition. We can see parameters of 
> "impala.events.catalogServiceId" and "impala.events.catalogVersion" added by 
> Impala. This is ok.
> {noformat}
> hive> desc formatted my_part partition(p=0);
> +-----------------------------------+----------------------------------------------------+-----------------------------------+
> |             col_name              |                     data_type           
>            |              comment              |
> +-----------------------------------+----------------------------------------------------+-----------------------------------+
> | i                                 | int                                     
>            |                                   |
> |                                   | NULL                                    
>            | NULL                              |
> | # Partition Information           | NULL                                    
>            | NULL                              |
> | # col_name                        | data_type                               
>            | comment                           |
> | p                                 | int                                     
>            |                                   |
> |                                   | NULL                                    
>            | NULL                              |
> | # Detailed Partition Information  | NULL                                    
>            | NULL                              |
> | Partition Value:                  | [0]                                     
>            | NULL                              |
> | Database:                         | default                                 
>            | NULL                              |
> | Table:                            | my_part                                 
>            | NULL                              |
> | CreateTime:                       | Wed Aug 09 15:24:50 CST 2023            
>            | NULL                              |
> | LastAccessTime:                   | UNKNOWN                                 
>            | NULL                              |
> | Location:                         | 
> hdfs://localhost:20500/test-warehouse/my_part/p=0  | NULL                     
>          |
> | Partition Parameters:             | NULL                                    
>            | NULL                              |
> |                                   | impala.events.catalogServiceId          
>            | eab33ebb8a14cfd:8b2bdc12df3568df  |
> |                                   | impala.events.catalogVersion            
>            | 1882                              |
> |                                   | numFiles                                
>            | 1                                 |
> |                                   | totalSize                               
>            | 358                               |
> |                                   | transient_lastDdlTime                   
>            | 1691565890                        |
> |                                   | NULL                                    
>            | NULL                              |
> | # Storage Information             | NULL                                    
>            | NULL                              |
> | SerDe Library:                    | 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | NULL            
>                   |
> | InputFormat:                      | 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | NULL          
>                     |
> | OutputFormat:                     | 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | NULL         
>                      |
> | Compressed:                       | No                                      
>            | NULL                              |
> | Num Buckets:                      | 0                                       
>            | NULL                              |
> | Bucket Columns:                   | []                                      
>            | NULL                              |
> | Sort Columns:                     | []                                      
>            | NULL                              |
> +-----------------------------------+----------------------------------------------------+-----------------------------------+
> {noformat}
> Now run an ALTER statement on the partition in Hive, e.g. changing the 
> location:
> {code:sql}
> alter table my_part partition(p=0) set location '/tmp';{code}
> Impala will skip the ALTER_PARTITION event since it's considered as a 
> self-event. In catalogd logs:
> {noformat}
> I0809 15:30:19.628449 29844 MetastoreEvents.java:628] EventId: 8351549 
> EventType: ALTER_PARTITION Incremented events skipped counter to 12
> I0809 15:30:19.628616 29844 MetastoreEvents.java:628] EventId: 8351549 
> EventType: ALTER_PARTITION Not processing the event as it is a 
> self-event{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to