[jira] [Resolved] (IMPALA-12256) Stale DROP_PARTITION events might not be skipped correctly

Quanlong Huang (Jira) Mon, 17 Jul 2023 17:01:08 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Quanlong Huang resolved IMPALA-12256.
-------------------------------------
    Fix Version/s: Impala 4.3.0
       Resolution: Fixed

Yeah, resolving this.

> Stale DROP_PARTITION events might not be skipped correctly
> ----------------------------------------------------------
>
>                 Key: IMPALA-12256
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12256
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>             Fix For: Impala 4.3.0
>
>
> Since IMPALA-10502, we track the create event id for db/table/partitions when 
> they are created. It's used to skip stale DROP events, i.e. events that are 
> generated earlier than the object is created.
> However, in some DDLs like COMPUTE INCREMENTAL STATS, we lost the create 
> event id when reloading partitions. This results in stale DROP_PARTITION 
> events not be skipped correctly.
> This can be reproduced with a higher value of "hms_event_polling_interval_s" 
> so the DROP_PARTITION event can come later than the COMPUTE INCREMENTAL STATS 
> finishes.
> {code:bash}
> bin/start-impala-cluster.py --catalogd_args=--hms_event_polling_interval_s=10 
> {code}
> Create a non-transactional partitioned table with one partition:
> {code:sql}
> create table my_part (id int) partitioned by (p int) stored as textfile;
> insert into my_part partition(p=0) values (0);{code}
> Put the below commands in a file and run them at once:
> {code:sql}
> alter table my_part drop if exists partition (p=0);
> insert into my_part partition(p=0) values (0),(1),(2),(3);
> compute incremental stats my_part partition(p=0);
> {code}
> In the catalogd logs, we can see the partition being dropped by the 
> DROP_PARTITION event:
> {code:java}
> I0630 13:27:11.840737 17106 CatalogOpExecutor.java:4484] EventId: 8316831 
> Skipping removal of 0/1 partitions since they don't exist or were created 
> later in table default.my_part.
> I0630 13:27:11.841095 17106 MetastoreEvents.java:628] EventId: 8316831 
> EventType: DROP_PARTITION 1 partitions dropped from table default.my_part
> {code}
> This event should be skipped since the partition is recreated after it. 
> Although there is a follow-up ADD_PARTITION event (generated by the 
> recreation statement) that will add back the partition, there is a period 
> between them that the metadata is incorrect (missing the actually existing 
> partition).
> The cause is we lost the create_event_id of the recreated partition when 
> reloading it for the COMPUTE INCREMENTAL STATS. There are other DDLs that 
> could cause the same issue, e.g. ALTER TABLE DROP STATS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (IMPALA-12256) Stale DROP_PARTITION events might not be skipped correctly

Reply via email to