https://github.com/apache/iceberg/pull/14004 is an attempt to update based
on Ryan's feedback.


Nicolae some thoughts below:

Interpreting this for the Iceberg REST Catalog interaction: writers
> should include the assert-current-schema-id requirement when
> committing to a table. Otherwise, inconsistencies may be introduced. I
> might be on a slightly outdated Iceberg build, but from what I can
> see, Spark with Iceberg doesn’t include this requirement during table
> commits (example:
> https://gist.github.com/nvartolomei/2fd6e994d1d9b5d61597c16339100518).


As outlined in the new PR I believe this depends on:
1.  Isolation level
2.  The operation being performed.

Ignoring changing nullable fields to non-nullable (which isn't covered in
the spec), I think there are times when using out of date schemas is fine
as long as one is not touching default value or one is using Snapshot
isolation. More details in the PR, please let me know if it makes sense.

Do we consider this an implementation bug then?


I'm not familiar enough with the implementation to say.

Also, what should a sensible implementation do when the requirement
> fails? Reload table schema and check if new schema is compatible with
> writer schema (i.e. it wouldn't be subject to the possible
> inconsistencies as described by the spec) and otherwise rewrite the
> data? For example rewrite parquet files by adding new all-null fields
> if needed, etc.


This seems like the sensible thing to do for serialized isolation.

Cheers,
Micah

On Fri, Sep 5, 2025 at 10:55 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> https://github.com/apache/iceberg/pull/14002 for revert, I'll post a new
> PR which try to accommodate the replies on the thread.
>
> On Fri, Sep 5, 2025 at 8:53 AM Ryan Blue <rdb...@gmail.com> wrote:
>
>> Would anyone object to reverting this change? I don't think that it makes
>> accurate statements. For example, I don't think that "not writing out all
>> columns or not using the latest schema can change the semantics of the data
>> written" is correct. It is perfectly fine to append with an older schema
>> and writers do this today. Saying that may "change the semantics of the
>> data written" makes this much scarier than it actually is.
>>
>> I think this confuses cases and errs too strongly on discouraging
>> reasonable things. It is, of course, bad to use an older schema that drops
>> columns from rows that are being updated. But that's a problem with the
>> engine choosing to drop current data columns more than it is a schema
>> selection issue.
>>
>> I think that the right path here is to state that operations should be
>> bound to the current schema when they are planned. It is okay to continue
>> producing data with an older schema in currently-running jobs. And we
>> should call out the behavior of defaults. But I don't think that we should
>> use language that clearly states what will happen rather than a generic
>> "may change the semantics of data written".
>>
>> On Fri, Sep 5, 2025 at 5:05 AM Nicolae Vartolomei
>> <n...@nvartolomei.com.invalid> wrote:
>>
>>> Quoting the newly added text in the spec:
>>>
>>> > Writers must write out all fields with the types specified from a
>>> schema present in table metadata. Writers should use the latest schema for
>>> writing. Not writing out all columns or not using the latest schema can
>>> change the semantics of the data written. The following are possible
>>> inconsistencies that can be introduced:
>>>
>>> Interpreting this for the Iceberg REST Catalog interaction: writers
>>> should include the assert-current-schema-id requirement when
>>> committing to a table. Otherwise, inconsistencies may be introduced. I
>>> might be on a slightly outdated Iceberg build, but from what I can
>>> see, Spark with Iceberg doesn’t include this requirement during table
>>> commits (example:
>>> https://gist.github.com/nvartolomei/2fd6e994d1d9b5d61597c16339100518).
>>>
>>> Do we consider this an implementation bug then?
>>>
>>> Also, what should a sensible implementation do when the requirement
>>> fails? Reload table schema and check if new schema is compatible with
>>> writer schema (i.e. it wouldn't be subject to the possible
>>> inconsistencies as described by the spec) and otherwise rewrite the
>>> data? For example rewrite parquet files by adding new all-null fields
>>> if needed, etc.
>>>
>>> On Thu, Sep 4, 2025 at 10:19 PM Micah Kornfield <emkornfi...@gmail.com>
>>> wrote:
>>> >
>>> > Just to follow-up the PR ended up getting merged.  In theory, there
>>> maybe should have been a vote, but unless someone feels strongly or would
>>> like to fine tune the language further perhaps we can consider the topic
>>> resolved?
>>> >
>>> > On Wed, Aug 27, 2025 at 2:17 PM Micah Kornfield <emkornfi...@gmail.com>
>>> wrote:
>>> >>
>>> >> I opened https://github.com/apache/iceberg/pull/13936 as a draft
>>> proposal to capture the conversation.
>>> >>
>>> >> BTW, I think one area this brings up that I don't think the
>>> specification handles is changing between nullable and not-nullable fields.
>>> Outdated schemas have some implications in these cases as well.
>>> >>
>>> >> Cheers,
>>> >> Micah
>>> >>
>>> >> On Tue, Aug 26, 2025 at 10:13 AM Micah Kornfield <
>>> emkornfi...@gmail.com> wrote:
>>> >>>
>>> >>> I think the original question is ambiguous.  We should probably
>>> subset this into two questions:
>>> >>>
>>> >>> 1.  Is it OK to write out an "int" instead of a "long" if the
>>> writer's schema says the value is a long?
>>> >>>
>>> >>> I think the answer here is we recommended not doing so, even though
>>> it would likely work.
>>> >>>
>>> >>> 2. Is it OK to use an older schema for writing?
>>> >>>
>>> >>> The consensus on the thread seems to be yes.  I'll note that this
>>> can cause confusing results when the "write-default" [1] value for a column
>>> changes.  We should probably have an implementation note to clarify:
>>> >>> a.  Using a stale schema is allowed
>>> >>> b.  It might cause inconsistent results in the face of multiple
>>> writers when default values are used.
>>> >>>
>>> >>> Thoughts?
>>> >>>
>>> >>> Thanks,
>>> >>> Micah
>>> >>>
>>> >>> On Mon, Aug 25, 2025 at 4:59 PM Ryan Blue <rdb...@gmail.com> wrote:
>>> >>>>
>>> >>>> I agree with Dan that type promotion should be well-defined. If
>>> it's a grey area then we should clarify it in the spec.
>>> >>>>
>>> >>>> How it works today is that schema evolution always produces a
>>> schema that can read files written with any older schema. When a type is
>>> promoted, the new schema can read any older data file, but readers may need
>>> to promote values like the [int-to-long reader](
>>> https://github.com/apache/iceberg/blob/main/parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java#L546-L560)
>>> does. You aren't guaranteed to be able to read new data using an older
>>> schema, so the latest schema should always be used or you should use the
>>> schema attached to a snapshot.
>>> >>>>
>>> >>>> Because files with older schemas can always be read, it is safe to
>>> write files with an older schema. This happens fairly regularly, as Steven
>>> noted, in cases where a writer has a fixed schema and is long-running.
>>> >>>>
>>> >>>> Ryan
>>> >>>>
>>> >>>> On Thu, Aug 21, 2025 at 5:37 PM Steven Wu <stevenz...@gmail.com>
>>> wrote:
>>> >>>>>
>>> >>>>> > This means that you can have writers using different schema to
>>> write (use cases include different partitioning or "out-of-date" writers),
>>> but the data is still valid.
>>> >>>>>
>>> >>>>> +1 on Dan's point. Both batch and streaming writers can have stale
>>> schema. long-running streaming jobs may stay stale for extended periods
>>> before picking up the new schema during restart.
>>> >>>>>
>>> >>>>> On Wed, Aug 20, 2025 at 2:50 PM Daniel Weeks <dwe...@apache.org>
>>> wrote:
>>> >>>>>>
>>> >>>>>> I think I'm going to disagree and argue that it's not really a
>>> gray area.
>>> >>>>>>
>>> >>>>>> Having strict schema evolution rules and how schema's are tracked
>>> means that there is independence between writer and reader schemas which
>>> remain compatible due to the evolution rules.
>>> >>>>>>
>>> >>>>>> This means that you can have writers using different schema to
>>> write (use cases include different partitioning or "out-of-date" writers),
>>> but the data is still valid.
>>> >>>>>>
>>> >>>>>> How you promote physical representation during a read/scan
>>> operation results in a consistent presentation with the read schema.
>>> >>>>>>
>>> >>>>>> All of the representations are technically valid.
>>> >>>>>>
>>> >>>>>> -Dan
>>> >>>>>>
>>> >>>>>> On Mon, Aug 18, 2025 at 7:46 AM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>> >>>>>>>
>>> >>>>>>> +1 to what Micah said :) sorry about the typo
>>> >>>>>>>
>>> >>>>>>> On Mon, Aug 18, 2025 at 9:45 AM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>> >>>>>>>>
>>> >>>>>>>> +1 to what Micaah , We have never really written rules about
>>> what is "allowed" in this particular context but since
>>> >>>>>>>> a reader needs to be able to handle both int/long values for
>>> the column, there isn't really any danger in writing
>>> >>>>>>>> new files with the narrower type. If a reader couldn't handle
>>> this, then type promotion would be impossible.
>>> >>>>>>>>
>>> >>>>>>>> I would include all columns in the file, the space requirements
>>> for an all null column (or all constant column) should
>>> >>>>>>>> be very small. I believe the reason we original wrote those
>>> rules in was to avoid folks doing the Hive Style
>>> >>>>>>>> implicit columns from partition tuple (although we also have
>>> handling for this.)
>>> >>>>>>>>
>>> >>>>>>>> On Sun, Aug 17, 2025 at 11:15 PM Micah Kornfield <
>>> emkornfi...@gmail.com> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>  Hi Nic,
>>> >>>>>>>>> This is IMO a gray area.
>>> >>>>>>>>>
>>> >>>>>>>>>> However, is it allowed to commit *new* parquet files with the
>>> old
>>> >>>>>>>>>> types (int) and commit them to the table with a table schema
>>> where
>>> >>>>>>>>>> types are promoted (long)?
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> IMO  I would expect writers to be writing files that are
>>> consistent with the current metadata, so ideally they would not be written
>>> with int if it is now long.  In general, though in these cases I think most
>>> readers are robust to reading type promoted files.  We should probably
>>> clarify in the specification.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>> Also, is it allowed to commit parquet files, in general,
>>> which contain
>>> >>>>>>>>>> only a subset of columns of table schema? I.e. if I know a
>>> column is
>>> >>>>>>>>>> all NULLs, can we just skip writing it?
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> As currently worded the spec on writing data files (
>>> https://iceberg.apache.org/spec/#writing-data-files) should include all
>>> columns. Based on column projection rules, however, failing to do so should
>>> also not cause problems.
>>> >>>>>>>>>
>>> >>>>>>>>> Cheers,
>>> >>>>>>>>> Micah
>>> >>>>>>>>>
>>> >>>>>>>>> On Fri, Aug 15, 2025 at 8:45 AM Nicolae Vartolomei
>>> <n...@nvartolomei.com.invalid> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>> Hi,
>>> >>>>>>>>>>
>>> >>>>>>>>>> I'm implementing an Iceberg writer[^1] and have a question
>>> about what
>>> >>>>>>>>>> type promotion actually means as part of schema evolution
>>> rules.
>>> >>>>>>>>>>
>>> >>>>>>>>>> Iceberg spec [specifies][spec-evo] which type promotions are
>>> allowed.
>>> >>>>>>>>>> No confusion there.
>>> >>>>>>>>>>
>>> >>>>>>>>>> The confusion on my end arises when it comes to actually
>>> writing i.e.
>>> >>>>>>>>>> parquet data. Let's take for example the int to long
>>> promotion. What
>>> >>>>>>>>>> is actually allowed under this promotion rule? Let me try to
>>> show what
>>> >>>>>>>>>> I mean.
>>> >>>>>>>>>>
>>> >>>>>>>>>> Obviously if I have a schema-id N with field A of type int
>>> and table
>>> >>>>>>>>>> snapshots with this schema then it is possible to update the
>>> table
>>> >>>>>>>>>> schema-id to > N where field A now has type long and this new
>>> schema
>>> >>>>>>>>>> can read parquet files with the old type.
>>> >>>>>>>>>>
>>> >>>>>>>>>> However, is it allowed to commit *new* parquet files with the
>>> old
>>> >>>>>>>>>> types (int) and commit them to the table with a table schema
>>> where
>>> >>>>>>>>>> types are promoted (long)?
>>> >>>>>>>>>>
>>> >>>>>>>>>> Also, is it allowed to commit parquet files, in general,
>>> which contain
>>> >>>>>>>>>> only a subset of columns of table schema? I.e. if I know a
>>> column is
>>> >>>>>>>>>> all NULLs, can we just skip writing it?
>>> >>>>>>>>>>
>>> >>>>>>>>>> Appreciate taking the time to look at this,
>>> >>>>>>>>>> Nic
>>> >>>>>>>>>>
>>> >>>>>>>>>> [spec-evo]: https://iceberg.apache.org/spec/#schema-evolution
>>> >>>>>>>>>> [^1]: This is for Redpanda to Iceberg native integration
>>> >>>>>>>>>> (https://github.com/redpanda-data/redpanda).
>>>
>>

Reply via email to