Re: Support Flink SQL Upsert a Spark table

Ryan Blue Thu, 04 Jan 2024 08:50:37 -0800

You can set the primary key fields in Spark using `ALTER TABLE`:

`ALTER TABLE t SET IDENTIFIER FIELDS id`


Spark doesn't support any primary key syntax, so you have to do this as a
separate step.

On Thu, Jan 4, 2024 at 8:46 AM Péter Váry <peter.vary.apa...@gmail.com>
wrote:

> Hi Manu,
>
> The Iceberg Schema defines `identifierFieldIds` method [1], and Flink uses
> that as the primary key.
> Are you saying there is no way to set it in Spark and Trino?
>
> Thanks,
> Peter
>
> [1]
> https://github.com/apache/iceberg/blob/9a00f7477dedac4501fb2de9e1e6d7aa83dc20b7/api/src/main/java/org/apache/iceberg/Schema.java#L280
>
> Manu Zhang <owenzhang1...@gmail.com> ezt írta (időpont: 2024. jan. 4.,
> Cs, 16:45):
>
>> Hi all,
>>
>> Currently, we support upserting a Flink created table with Flink SQL
>> where primary keys are required as equality fields. They are not required
>> in Java API.
>>
>> However, if the table is created by Spark, where there's no primary key,
>> we cannot upsert with Flink SQL. Hence, I proposed
>> https://github.com/apache/iceberg/pull/8195 to support specifying
>> equality columns with Flink SQL write options.
>>
>> @pvary  <https://github.com/pvary> suggested it would be better to
>> support primary keys in Spark, Trino, etc. Since these engines don't have
>> primary keys in their table definitions, a workaround is to put primary key
>> columns in table properties. Maybe there are other options I've missed.
>>
>> Flink SQL sinking to Spark tables for analysis is a typical pipeline in
>> our datalake.  I'd like to hear your thoughts on best supporting this case.
>>
>> Happy New Year!
>> Manu
>>
>

-- 
Ryan Blue
Tabular

Re: Support Flink SQL Upsert a Spark table

Reply via email to