+1 to what Micah said :) sorry about the typo

On Mon, Aug 18, 2025 at 9:45 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> +1 to what Micaah , We have never really written rules about what is
> "allowed" in this particular context but since
> a reader needs to be able to handle both int/long values for the column,
> there isn't really any danger in writing
> new files with the narrower type. If a reader couldn't handle this, then
> type promotion would be impossible.
>
> I would include all columns in the file, the space requirements for an all
> null column (or all constant column) should
> be very small. I believe the reason we original wrote those rules in was
> to avoid folks doing the Hive Style
> implicit columns from partition tuple (although we also have handling for
> this.)
>
> On Sun, Aug 17, 2025 at 11:15 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>>
>>  Hi Nic,
>> This is IMO a gray area.
>>
>> However, is it allowed to commit *new* parquet files with the old
>>> types (int) and commit them to the table with a table schema where
>>> types are promoted (long)?
>>
>>
>> IMO  I would expect writers to be writing files that are consistent with
>> the current metadata, so ideally they would not be written with int if it
>> is now long.  In general, though in these cases I think most readers are
>> robust to reading type promoted files.  We should probably clarify in the
>> specification.
>>
>>
>> Also, is it allowed to commit parquet files, in general, which contain
>>> only a subset of columns of table schema? I.e. if I know a column is
>>> all NULLs, can we just skip writing it?
>>
>>
>> As currently worded the spec on writing data files (
>> https://iceberg.apache.org/spec/#writing-data-files) should include all
>> columns. Based on column projection rules
>> <https://iceberg.apache.org/spec/#column-projection>, however, failing
>> to do so should also not cause problems.
>>
>> Cheers,
>> Micah
>>
>> On Fri, Aug 15, 2025 at 8:45 AM Nicolae Vartolomei
>> <n...@nvartolomei.com.invalid> wrote:
>>
>>> Hi,
>>>
>>> I'm implementing an Iceberg writer[^1] and have a question about what
>>> type promotion actually means as part of schema evolution rules.
>>>
>>> Iceberg spec [specifies][spec-evo] which type promotions are allowed.
>>> No confusion there.
>>>
>>> The confusion on my end arises when it comes to actually writing i.e.
>>> parquet data. Let's take for example the int to long promotion. What
>>> is actually allowed under this promotion rule? Let me try to show what
>>> I mean.
>>>
>>> Obviously if I have a schema-id N with field A of type int and table
>>> snapshots with this schema then it is possible to update the table
>>> schema-id to > N where field A now has type long and this new schema
>>> can read parquet files with the old type.
>>>
>>> However, is it allowed to commit *new* parquet files with the old
>>> types (int) and commit them to the table with a table schema where
>>> types are promoted (long)?
>>>
>>> Also, is it allowed to commit parquet files, in general, which contain
>>> only a subset of columns of table schema? I.e. if I know a column is
>>> all NULLs, can we just skip writing it?
>>>
>>> Appreciate taking the time to look at this,
>>> Nic
>>>
>>> [spec-evo]: https://iceberg.apache.org/spec/#schema-evolution
>>> [^1]: This is for Redpanda to Iceberg native integration
>>> (https://github.com/redpanda-data/redpanda).
>>>
>>

Reply via email to