Re: Support Flink SQL Upsert a Spark table

Péter Váry Thu, 04 Jan 2024 08:46:25 -0800

Hi Manu,

The Iceberg Schema defines `identifierFieldIds` method [1], and Flink uses
that as the primary key.
Are you saying there is no way to set it in Spark and Trino?


Thanks,
Peter

[1]
https://github.com/apache/iceberg/blob/9a00f7477dedac4501fb2de9e1e6d7aa83dc20b7/api/src/main/java/org/apache/iceberg/Schema.java#L280

Manu Zhang <owenzhang1...@gmail.com> ezt írta (időpont: 2024. jan. 4., Cs,
16:45):

> Hi all,
>
> Currently, we support upserting a Flink created table with Flink SQL where
> primary keys are required as equality fields. They are not required in Java
> API.
>
> However, if the table is created by Spark, where there's no primary key,
> we cannot upsert with Flink SQL. Hence, I proposed
> https://github.com/apache/iceberg/pull/8195 to support specifying
> equality columns with Flink SQL write options.
>
> @pvary  <https://github.com/pvary> suggested it would be better to
> support primary keys in Spark, Trino, etc. Since these engines don't have
> primary keys in their table definitions, a workaround is to put primary key
> columns in table properties. Maybe there are other options I've missed.
>
> Flink SQL sinking to Spark tables for analysis is a typical pipeline in
> our datalake.  I'd like to hear your thoughts on best supporting this case.
>
> Happy New Year!
> Manu
>

Re: Support Flink SQL Upsert a Spark table

Reply via email to