[
https://issues.apache.org/jira/browse/IMPALA-12597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Kaszab resolved IMPALA-12597.
-----------------------------------
Fix Version/s: Impala 4.4.0
Resolution: Fixed
> Basic equality delete support
> -----------------------------
>
> Key: IMPALA-12597
> URL: https://issues.apache.org/jira/browse/IMPALA-12597
> Project: IMPALA
> Issue Type: Sub-task
> Components: Backend, Frontend
> Reporter: Gabor Kaszab
> Assignee: Gabor Kaszab
> Priority: Major
> Labels: impala-iceberg
> Fix For: Impala 4.4.0
>
>
> To split up the Equality-delete read support task, let's deliver a patch for
> some initial support first. The idea here is that apparently Flink (one of
> the engines that can write equality delete files) can write only a subset of
> the possible equality delete use cases that are allowed by the Iceberg spec.
> So as a first step let's deliver the functionality that is required to read
> the EQ-deletes written by Flink. The use case: when Flink writes EQ-deletes
> is for tables in upsert mode (primary key is a must in this case) in order to
> guarantee the uniqueness of the primary key fields, for each insert (that is
> in fact an upsert) Flink writes one delete file to remove the previous row
> with the given PK (even if there hasn't been any) and then writes data files
> with the new data.
> How we can narrow down the functionality to be implemented on Impala side:
> * The set of PK columns is not alterable, so we don't have to implement when
> different EQ-delete files have different equality field ID lists.
> * Flink's ALTERĀ TABLE for Iceberg tables doesn't allow partition and schema
> evolution. We can reject queries on eq-delete tables where there was
> partition or schema evolution.
> * As eq-deletes are written to NOT NULL PK's we could omit the case where
> there are NULLs in the eq-delete file. (Update, this seemed easy to solve, so
> will be part of this patch)
> * For partitioned tables Flink requires the partition columns to be part of
> the PK. As a result each EQ-delete file will have the partition values too so
> no need to add extra logic to check if the partition spec ID and the
> partition values match between the data and delete files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)