Gabor Kaszab created IMPALA-12597:
-------------------------------------
Summary: Basic equality delete support
Key: IMPALA-12597
URL: https://issues.apache.org/jira/browse/IMPALA-12597
Project: IMPALA
Issue Type: Sub-task
Components: Backend, Frontend
Reporter: Gabor Kaszab
To split up the Equality-delete read support task, let's deliver a patch for
some initial support first. The idea here is that apparently Flink (one of the
engines that can write equality delete files) can write only a subset of the
possible equality delete use cases that are allowed by the Iceberg spec.
So as a first step let's deliver the functionality that is required to read the
EQ-deletes written by Flink. The use case: when Flink writes EQ-deletes is for
tables in upsert mode (primary key is a must in this case) in order to
guarantee the uniqueness of the primary key fields, for each insert (that is in
fact an upsert) Flink writes one delete file to remove the previous row with
the given PK (even if there hasn't been any) and then writes data files with
the new data.
How we can narrow down the functionality to be implemented on Impala side:
* The set of PK columns is not alterable, so we don't have to implement when
different EQ-delete files have different equality field ID lists.
* Flink's ALTERĀ TABLE for Iceberg tables doesn't allow partition and schema
evolution. We can reject queries on eq-delete tables where there was partition
or schema evolution.
* As eq-deletes are written to NOT NULL PK's we could omit the case where
there are NULLs in the eq-delete file. (Update, this seemed easy to solve, so
will be part of this patch)
* For partitioned tables Flink requires the partition columns to be part of
the PK. As a result each EQ-delete file will have the partition values too so
no need to add extra logic to check if the partition spec ID and the partition
values match between the data and delete files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]