I would agree the best path forward is to note the current behavior for
v1/v2 since that's well established and address the behavior in v3.

For compatibility with existing libraries, we should maintain that `-1` is
equivalent to no snapshot and it should be written for v1/v2.

With V3 we should support `-1` for backwards compatibility (though there's
low likelihood that someone will promote an empty table), but always write
`null`.

I believe this is in part due to TableMetadata tracking the
current-snapshot-id as a `long` value (as opposed to a Long) reference,
which resulted in this oversight.

-Dan

On Tue, Feb 18, 2025 at 11:57 AM rdb...@gmail.com <rdb...@gmail.com> wrote:

> +1 to reverting PT 11560 in main and 1.8.1. That avoids unnecessary
> incompatibility with older readers.
>
> I also agree that we should update the spec to say what Russell suggests:
> > that -1 has meant "no current snapshot" in the past and is equivalent to
> missing/null.
>
> That's a correct description of the behavior.
>
> I don't think that we should create new requirements for this that didn't
> previously exist -- that is, I don't see much value in going back to
> mandate this behavior when writing v1 or v2 tables. If an implementation
> were writing null for the current snapshot ID up until now, I don't think
> that we can say that behavior was or is incorrect. It was an
> incompatibility with the Java implementation and we should note the
> behavior in the "Implementation Notes" section.
>
> > How about reverting #11560 for 1.8.1, and then reinstating this for
> 2.0.0?
>
> I think we need to fix this at a format version boundary, not a library
> version boundary. I'd be up for reinstating the write change for v3.
>
> On Tue, Feb 18, 2025 at 7:56 AM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> The only thing I think I agree with is defining that -1 has meant "no
>> current snapshot" in the past and
>> is equivalent to a missing/nuil (if we have to specify that) .
>>
>> I don't think there is any reason to change the behavior of writing null
>> / missing unless
>> that's really a point of confusion for folks. Is there a JSON library
>> folks are using
>>  which distinguishes between missing/null?
>>
>> Would having field:null be the same as missing have avoided the issue
>> because it seems like libraries
>> that only handled non-null would also not handle "missing"
>> current-snapshot-id as well?
>>
>> I'd really like to hear from other implementers here since changing all
>> this again would be a lot of
>> work and I thought we had understanding of Optional == Nullable as a
>> valid thing in the spec.
>>
>> On Tue, Feb 18, 2025 at 7:35 AM Robert Stupp <sn...@snazy.de> wrote:
>>
>>> Correcting myself: schema/spec/sort seem to be always present - please
>>> ignore that part in my previous email. The valid values for those fields
>>> however should be defined.
>>> On 18.02.25 14:29, Robert Stupp wrote:
>>>
>>> Reality is that Iceberg did write '-1' into current-snapshot-id (and
>>> other "non-exist" marker values for schema/spec/sort) instead of omitting
>>> the field.
>>>
>>> Side note: the table-spec says that these fields are optional, but
>>> nothing about whether it is nullable.
>>>
>>> The spec should at least be amended to explicitly define the valid
>>> values (-1 for current-snapshot-id for no snapshot is just a fact now that
>>> it's there). IMO that field being 'null' isn't defined in the spec, but the
>>> absence of the field is.
>>>
>>> Proposal for the implementation:
>>>
>>> * revert https://github.com/apache/iceberg/pull/11560 in 1.8.1 and on
>>> main
>>>
>>> Proposal for the spec (for current-snapshot-id):
>>>
>>> * Define the valid value ranges for the field
>>> * Define that the absence of the field means "no current snapshot"
>>> * Define that the value -1 of the field means "no current snapshot"
>>> * Define that the value 'null' of the field means "no current snapshot"
>>> * Define that new implementations must not write the field if there's no
>>> current snapshot
>>> Proposal for the spec (for schema/spec/sort):
>>>
>>> * Define the valid value ranges for the fields
>>> * Define the "schema/spec/sort not present" values (the fields are
>>> optional for v1 but required for v2+v3).
>>> * OR Define that "schema/spec/sort must be absent" if there is no
>>> current schema/spec/sort.
>>>
>>> WDYT?
>>>
>>> On 17.02.25 21:07, Russell Spitzer wrote:
>>>
>>> It sounds like the argument here is that we should change the Spec for
>>> V1, V2, and V3 to mark current-snapshot-id
>>> as required. Then we should change all other implementations to follow
>>> this new standard. I'm not sure that
>>> is a good solution going forwards but I'm not sure of how we can support
>>> catalogs/engines that cannot handle a null
>>> correctly in this situation otherwise. Perhaps we should source out to
>>> see if any other implementers worked off the
>>> assumption of a non-optional "current-snapshot-id" and if we get a
>>> critical mass we can try to make that change?
>>> Because of how wide that change would be, I think we would need pretty
>>> broad consensus to do so.
>>>
>>> We could possibly also have a flag to allow the old behavior but that
>>> also feels wrong to me, we have often gone with a motto of
>>> read "wrong" write "correct" for things like this in the past and
>>> continuing to write "wrong" is a disservice to any
>>> new implementers . When we do have a contradiction between our
>>> implementation and the spec I think we have
>>> to trust that implementers followed the spec and fix the core library.
>>>
>>> Are there any other solutions here?
>>>
>>> On Mon, Feb 17, 2025 at 11:45 AM Fokko Driesprong <fo...@apache.org>
>>> wrote:
>>>
>>>> Hey Robert,
>>>>
>>>> The thing is, that -1 cannot "go away".
>>>>
>>>>
>>>> Yes, I agree, but that's also the case for null, as the field is optional
>>>> in the spec
>>>> <https://iceberg.apache.org/spec/?column-projection#table-metadata-fields>.
>>>> Therefore we support both in PyIceberg
>>>> <https://github.com/apache/iceberg-python/blob/300b8405a0fe7d0111321e5644d704026af9266b/pyiceberg/table/metadata.py#L71-L77>,
>>>> Iceberg-Rust
>>>> <https://github.com/apache/iceberg-rust/blob/752d69041e0461989c48dd1ca79bcff577776f5d/crates/iceberg/src/spec/table_metadata.rs#L500>,
>>>> Iceberg-Java
>>>> <https://github.com/apache/iceberg/blob/bcbbd0344623ffea5b092e2de5debb0bc12892a1/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L458-L462>,
>>>> and Iceberg-Go
>>>> <https://github.com/apache/iceberg-go/blob/ada5480954d9b41d2f8eb4c765523614fad65e1a/table/metadata.go#L837-L841>.
>>>> On the write side, they all produce null instead of -1. Therefore, I
>>>> was surprised that it comes up now, and not earlier.
>>>>
>>>> I'd prefer to keep the previous behavior - otherwise implementations
>>>>> may fall back to 0, which is definitely wrong.
>>>>
>>>>
>>>> I'm not seeing why it would fall back to 0, and I agree, that's wrong.
>>>>
>>>> Would be better IMHO not to break existing implementations / render
>>>>> existing setups incompatible with Iceberg 1.8.
>>>>
>>>>
>>>> In my opinion, if this had been caught in an RC, it would be open for
>>>> discussion, but that ship has sailed. Let's hear what others think.
>>>>
>>>> Kind regards,
>>>> Fokko
>>>>
>>>> Op ma 17 feb 2025 om 18:16 schreef Robert Stupp <sn...@snazy.de>:
>>>>
>>>>> Hi Fokko,
>>>>>
>>>>> sure, in general "absent" or "null" would be cleaner. But now we have
>>>>> two representations for the same case - I suspect most went with the
>>>>> "reference behavior".
>>>>>
>>>>> The thing is, that -1 cannot "go away".
>>>>>
>>>>> I'd prefer to keep the previous behavior - otherwise implementations
>>>>> may fall back to 0, which is definitely wrong. Would be better IMHO not to
>>>>> break existing implementations / render existing setups incompatible with
>>>>> Iceberg 1.8.
>>>>>
>>>>>
>>>>> On 17.02.25 15:49, Fokko Driesprong wrote:
>>>>>
>>>>> Hey Robert,
>>>>>
>>>>> Thanks for raising this.
>>>>>
>>>>> snapshot-ID -1 isn't per-se invalid, because the valid values are not
>>>>>> defined in the spec.
>>>>>
>>>>>
>>>>> For me, this is invalid, since there is no snapshot with -1 in the
>>>>> snapshots property. In the tests with the PR, you can see that there
>>>>> are no snapshots
>>>>> <https://github.com/apache/iceberg/pull/11560/files#diff-41bdfb6698d2aa7b47ff7d5fabc558a5a64f8b7496fe1bcd8f8ecb69b2afc128R112>.
>>>>> A year ago we had a similar discussion on PyIceberg
>>>>> <https://py.iceberg.apache.org/configuration/#backward-compatibility>
>>>>> around this and this ended up in adding a flag to fall back to the old
>>>>> behavior. I do agree that we should have communicated this more clearly
>>>>> with the release.
>>>>>
>>>>> Kind regards,
>>>>> Fokko
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Op ma 17 feb 2025 om 12:25 schreef Robert Stupp <sn...@snazy.de>:
>>>>>
>>>>>> Feels like https://github.com/apache/iceberg/pull/11560 introduced a
>>>>>> behavior change.
>>>>>>
>>>>>> snapshot-ID -1 isn't per-se invalid, because the valid values are not
>>>>>> defined in the spec.
>>>>>>
>>>>>> Previous Iceberg-Java versions always produced -1 if there's no
>>>>>> current snapshot - 1.8 produces `null` in that case. So there are now two
>>>>>> _different_ values for "no current snapshot".
>>>>>>
>>>>>> Implementations that rely on the behavior of the "reference
>>>>>> implementation" (Iceberg-Java) do now fail in case there's no current
>>>>>> snapshot.
>>>>>> On 13.02.25 10:09, Amogh Jahagirdar wrote:
>>>>>>
>>>>>> I'm pleased to announce the release of Apache Iceberg 1.8.0!
>>>>>>
>>>>>> Apache Iceberg is an open table format for huge analytic datasets.
>>>>>> Iceberg
>>>>>> delivers high query performance for tables with tens of petabytes of
>>>>>> data,
>>>>>> along with atomic commits, concurrent writes, and SQL-compatible table
>>>>>> evolution.
>>>>>>
>>>>>> This release can be downloaded from:
>>>>>> https://www.apache.org/dyn/closer.cgi/iceberg/apache-iceberg-1.8.0/apache-iceberg-1.8.0.tar.gz
>>>>>>
>>>>>>
>>>>>> Release notes: https://iceberg.apache.org/releases/#180-release
>>>>>> <https://iceberg.apache.org/releases/180-release>
>>>>>>
>>>>>> Java artifacts are available from Maven Central.
>>>>>>
>>>>>> Thanks to everyone for contributing!
>>>>>>
>>>>>> --
>>>>>> Robert Stupp
>>>>>> @snazy
>>>>>>
>>>>>> --
>>>>> Robert Stupp
>>>>> @snazy
>>>>>
>>>>> --
>>> Robert Stupp
>>> @snazy
>>>
>>> --
>>> Robert Stupp
>>> @snazy
>>>
>>>

Reply via email to