Re: Support Flink SQL Upsert a Spark table

Manu Zhang Mon, 15 Jan 2024 05:19:29 -0800

Thanks xianjin. It's working now.
I also created a PR to enhance the documentation
https://github.com/apache/iceberg/pull/9478


Thanks,
Manu

On Thu, Jan 11, 2024 at 11:08 AM xianjin <xian...@apache.org> wrote:

> You can create an Iceberg table with required field, for example:
>
> create table test_table (id bigint not null, data string) using iceberg
>
>
> However you can not change the optional field to required after creation.
> See this issue for more details:
> https://github.com/apache/iceberg/issues/3617
>
> Manu Zhang <owenzhang1...@gmail.com> 于2024年1月11日周四 10:08写道：
>
>> It looks like there's no way to explicitly add a required column in DDL.
>> Any suggestions?
>>
>> Much appreciated
>> Manu
>>
>> On Tue, Jan 9, 2024 at 3:37 PM Manu Zhang <owenzhang1...@gmail.com>
>> wrote:
>>
>>> Thanks Peter and Ryan for the info.
>>>
>>> As identifier fields need to be "required", how can I alter an optional
>>> column to be required in Spark SQL?
>>>
>>> Thanks,
>>> Manu
>>>
>>> On Fri, Jan 5, 2024 at 12:50 AM Ryan Blue <b...@tabular.io> wrote:
>>>
>>>> You can set the primary key fields in Spark using `ALTER TABLE`:
>>>>
>>>> `ALTER TABLE t SET IDENTIFIER FIELDS id`
>>>>
>>>> Spark doesn't support any primary key syntax, so you have to do this as
>>>> a separate step.
>>>>
>>>> On Thu, Jan 4, 2024 at 8:46 AM Péter Váry <peter.vary.apa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Manu,
>>>>>
>>>>> The Iceberg Schema defines `identifierFieldIds` method [1], and Flink
>>>>> uses that as the primary key.
>>>>> Are you saying there is no way to set it in Spark and Trino?
>>>>>
>>>>> Thanks,
>>>>> Peter
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/iceberg/blob/9a00f7477dedac4501fb2de9e1e6d7aa83dc20b7/api/src/main/java/org/apache/iceberg/Schema.java#L280
>>>>>
>>>>> Manu Zhang <owenzhang1...@gmail.com> ezt írta (időpont: 2024. jan.
>>>>> 4., Cs, 16:45):
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Currently, we support upserting a Flink created table with Flink SQL
>>>>>> where primary keys are required as equality fields. They are not required
>>>>>> in Java API.
>>>>>>
>>>>>> However, if the table is created by Spark, where there's no primary
>>>>>> key, we cannot upsert with Flink SQL. Hence, I proposed
>>>>>> https://github.com/apache/iceberg/pull/8195 to support specifying
>>>>>> equality columns with Flink SQL write options.
>>>>>>
>>>>>> @pvary  <https://github.com/pvary> suggested it would be better to
>>>>>> support primary keys in Spark, Trino, etc. Since these engines don't have
>>>>>> primary keys in their table definitions, a workaround is to put primary 
>>>>>> key
>>>>>> columns in table properties. Maybe there are other options I've missed.
>>>>>>
>>>>>> Flink SQL sinking to Spark tables for analysis is a typical pipeline
>>>>>> in our datalake.  I'd like to hear your thoughts on best supporting this
>>>>>> case.
>>>>>>
>>>>>> Happy New Year!
>>>>>> Manu
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Tabular
>>>>
>>>

Re: Support Flink SQL Upsert a Spark table

Reply via email to