Re: [QUESTION] What type promotion actually means

Daniel Weeks Wed, 20 Aug 2025 14:50:15 -0700

I think I'm going to disagree and argue that it's not really a gray area.

Having strict schema evolution rules and how schema's are tracked means
that there is independence between writer and reader schemas which remain
compatible due to the evolution rules.


This means that you can have writers using different schema to write (use
cases include different partitioning or "out-of-date" writers), but the
data is still valid.

How you promote physical representation during a read/scan operation
results in a consistent presentation with the read schema.

All of the representations are technically valid.

-Dan

On Mon, Aug 18, 2025 at 7:46 AM Russell Spitzer <[email protected]>
wrote:

> +1 to what Micah said :) sorry about the typo
>
> On Mon, Aug 18, 2025 at 9:45 AM Russell Spitzer <[email protected]>
> wrote:
>
>> +1 to what Micaah , We have never really written rules about what is
>> "allowed" in this particular context but since
>> a reader needs to be able to handle both int/long values for the column,
>> there isn't really any danger in writing
>> new files with the narrower type. If a reader couldn't handle this, then
>> type promotion would be impossible.
>>
>> I would include all columns in the file, the space requirements for an
>> all null column (or all constant column) should
>> be very small. I believe the reason we original wrote those rules in was
>> to avoid folks doing the Hive Style
>> implicit columns from partition tuple (although we also have handling for
>> this.)
>>
>> On Sun, Aug 17, 2025 at 11:15 PM Micah Kornfield <[email protected]>
>> wrote:
>>
>>>
>>>  Hi Nic,
>>> This is IMO a gray area.
>>>
>>> However, is it allowed to commit *new* parquet files with the old
>>>> types (int) and commit them to the table with a table schema where
>>>> types are promoted (long)?
>>>
>>>
>>> IMO  I would expect writers to be writing files that are consistent with
>>> the current metadata, so ideally they would not be written with int if it
>>> is now long.  In general, though in these cases I think most readers are
>>> robust to reading type promoted files.  We should probably clarify in the
>>> specification.
>>>
>>>
>>> Also, is it allowed to commit parquet files, in general, which contain
>>>> only a subset of columns of table schema? I.e. if I know a column is
>>>> all NULLs, can we just skip writing it?
>>>
>>>
>>> As currently worded the spec on writing data files (
>>> https://iceberg.apache.org/spec/#writing-data-files) should include all
>>> columns. Based on column projection rules
>>> <https://iceberg.apache.org/spec/#column-projection>, however, failing
>>> to do so should also not cause problems.
>>>
>>> Cheers,
>>> Micah
>>>
>>> On Fri, Aug 15, 2025 at 8:45 AM Nicolae Vartolomei
>>> <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm implementing an Iceberg writer[^1] and have a question about what
>>>> type promotion actually means as part of schema evolution rules.
>>>>
>>>> Iceberg spec [specifies][spec-evo] which type promotions are allowed.
>>>> No confusion there.
>>>>
>>>> The confusion on my end arises when it comes to actually writing i.e.
>>>> parquet data. Let's take for example the int to long promotion. What
>>>> is actually allowed under this promotion rule? Let me try to show what
>>>> I mean.
>>>>
>>>> Obviously if I have a schema-id N with field A of type int and table
>>>> snapshots with this schema then it is possible to update the table
>>>> schema-id to > N where field A now has type long and this new schema
>>>> can read parquet files with the old type.
>>>>
>>>> However, is it allowed to commit *new* parquet files with the old
>>>> types (int) and commit them to the table with a table schema where
>>>> types are promoted (long)?
>>>>
>>>> Also, is it allowed to commit parquet files, in general, which contain
>>>> only a subset of columns of table schema? I.e. if I know a column is
>>>> all NULLs, can we just skip writing it?
>>>>
>>>> Appreciate taking the time to look at this,
>>>> Nic
>>>>
>>>> [spec-evo]: https://iceberg.apache.org/spec/#schema-evolution
>>>> [^1]: This is for Redpanda to Iceberg native integration
>>>> (https://github.com/redpanda-data/redpanda).
>>>>
>>>

Re: [QUESTION] What type promotion actually means

Reply via email to