It looks like there's no way to explicitly add a required column in DDL.
Any suggestions?

Much appreciated
Manu

On Tue, Jan 9, 2024 at 3:37 PM Manu Zhang <owenzhang1...@gmail.com> wrote:

> Thanks Peter and Ryan for the info.
>
> As identifier fields need to be "required", how can I alter an optional
> column to be required in Spark SQL?
>
> Thanks,
> Manu
>
> On Fri, Jan 5, 2024 at 12:50 AM Ryan Blue <b...@tabular.io> wrote:
>
>> You can set the primary key fields in Spark using `ALTER TABLE`:
>>
>> `ALTER TABLE t SET IDENTIFIER FIELDS id`
>>
>> Spark doesn't support any primary key syntax, so you have to do this as a
>> separate step.
>>
>> On Thu, Jan 4, 2024 at 8:46 AM Péter Váry <peter.vary.apa...@gmail.com>
>> wrote:
>>
>>> Hi Manu,
>>>
>>> The Iceberg Schema defines `identifierFieldIds` method [1], and Flink
>>> uses that as the primary key.
>>> Are you saying there is no way to set it in Spark and Trino?
>>>
>>> Thanks,
>>> Peter
>>>
>>> [1]
>>> https://github.com/apache/iceberg/blob/9a00f7477dedac4501fb2de9e1e6d7aa83dc20b7/api/src/main/java/org/apache/iceberg/Schema.java#L280
>>>
>>> Manu Zhang <owenzhang1...@gmail.com> ezt írta (időpont: 2024. jan. 4.,
>>> Cs, 16:45):
>>>
>>>> Hi all,
>>>>
>>>> Currently, we support upserting a Flink created table with Flink SQL
>>>> where primary keys are required as equality fields. They are not required
>>>> in Java API.
>>>>
>>>> However, if the table is created by Spark, where there's no primary
>>>> key, we cannot upsert with Flink SQL. Hence, I proposed
>>>> https://github.com/apache/iceberg/pull/8195 to support specifying
>>>> equality columns with Flink SQL write options.
>>>>
>>>> @pvary  <https://github.com/pvary> suggested it would be better to
>>>> support primary keys in Spark, Trino, etc. Since these engines don't have
>>>> primary keys in their table definitions, a workaround is to put primary key
>>>> columns in table properties. Maybe there are other options I've missed.
>>>>
>>>> Flink SQL sinking to Spark tables for analysis is a typical pipeline in
>>>> our datalake.  I'd like to hear your thoughts on best supporting this case.
>>>>
>>>> Happy New Year!
>>>> Manu
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

Reply via email to