The only thing I think I agree with is defining that -1 has meant "no
current snapshot" in the past and
is equivalent to a missing/nuil (if we have to specify that) .

I don't think there is any reason to change the behavior of writing null /
missing unless
that's really a point of confusion for folks. Is there a JSON library folks
are using
 which distinguishes between missing/null?

Would having field:null be the same as missing have avoided the issue
because it seems like libraries
that only handled non-null would also not handle "missing"
current-snapshot-id as well?

I'd really like to hear from other implementers here since changing all
this again would be a lot of
work and I thought we had understanding of Optional == Nullable as a valid
thing in the spec.

On Tue, Feb 18, 2025 at 7:35 AM Robert Stupp <sn...@snazy.de> wrote:

> Correcting myself: schema/spec/sort seem to be always present - please
> ignore that part in my previous email. The valid values for those fields
> however should be defined.
> On 18.02.25 14:29, Robert Stupp wrote:
>
> Reality is that Iceberg did write '-1' into current-snapshot-id (and other
> "non-exist" marker values for schema/spec/sort) instead of omitting the
> field.
>
> Side note: the table-spec says that these fields are optional, but nothing
> about whether it is nullable.
>
> The spec should at least be amended to explicitly define the valid values
> (-1 for current-snapshot-id for no snapshot is just a fact now that it's
> there). IMO that field being 'null' isn't defined in the spec, but the
> absence of the field is.
>
> Proposal for the implementation:
>
> * revert https://github.com/apache/iceberg/pull/11560 in 1.8.1 and on main
>
> Proposal for the spec (for current-snapshot-id):
>
> * Define the valid value ranges for the field
> * Define that the absence of the field means "no current snapshot"
> * Define that the value -1 of the field means "no current snapshot"
> * Define that the value 'null' of the field means "no current snapshot"
> * Define that new implementations must not write the field if there's no
> current snapshot
> Proposal for the spec (for schema/spec/sort):
>
> * Define the valid value ranges for the fields
> * Define the "schema/spec/sort not present" values (the fields are
> optional for v1 but required for v2+v3).
> * OR Define that "schema/spec/sort must be absent" if there is no current
> schema/spec/sort.
>
> WDYT?
>
> On 17.02.25 21:07, Russell Spitzer wrote:
>
> It sounds like the argument here is that we should change the Spec for V1,
> V2, and V3 to mark current-snapshot-id
> as required. Then we should change all other implementations to follow
> this new standard. I'm not sure that
> is a good solution going forwards but I'm not sure of how we can support
> catalogs/engines that cannot handle a null
> correctly in this situation otherwise. Perhaps we should source out to see
> if any other implementers worked off the
> assumption of a non-optional "current-snapshot-id" and if we get a
> critical mass we can try to make that change?
> Because of how wide that change would be, I think we would need pretty
> broad consensus to do so.
>
> We could possibly also have a flag to allow the old behavior but that also
> feels wrong to me, we have often gone with a motto of
> read "wrong" write "correct" for things like this in the past and
> continuing to write "wrong" is a disservice to any
> new implementers . When we do have a contradiction between our
> implementation and the spec I think we have
> to trust that implementers followed the spec and fix the core library.
>
> Are there any other solutions here?
>
> On Mon, Feb 17, 2025 at 11:45 AM Fokko Driesprong <fo...@apache.org>
> wrote:
>
>> Hey Robert,
>>
>> The thing is, that -1 cannot "go away".
>>
>>
>> Yes, I agree, but that's also the case for null, as the field is optional
>> in the spec
>> <https://iceberg.apache.org/spec/?column-projection#table-metadata-fields>.
>> Therefore we support both in PyIceberg
>> <https://github.com/apache/iceberg-python/blob/300b8405a0fe7d0111321e5644d704026af9266b/pyiceberg/table/metadata.py#L71-L77>,
>> Iceberg-Rust
>> <https://github.com/apache/iceberg-rust/blob/752d69041e0461989c48dd1ca79bcff577776f5d/crates/iceberg/src/spec/table_metadata.rs#L500>,
>> Iceberg-Java
>> <https://github.com/apache/iceberg/blob/bcbbd0344623ffea5b092e2de5debb0bc12892a1/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L458-L462>,
>> and Iceberg-Go
>> <https://github.com/apache/iceberg-go/blob/ada5480954d9b41d2f8eb4c765523614fad65e1a/table/metadata.go#L837-L841>.
>> On the write side, they all produce null instead of -1. Therefore, I was 
>> surprised that
>> it comes up now, and not earlier.
>>
>> I'd prefer to keep the previous behavior - otherwise implementations may
>>> fall back to 0, which is definitely wrong.
>>
>>
>> I'm not seeing why it would fall back to 0, and I agree, that's wrong.
>>
>> Would be better IMHO not to break existing implementations / render
>>> existing setups incompatible with Iceberg 1.8.
>>
>>
>> In my opinion, if this had been caught in an RC, it would be open for
>> discussion, but that ship has sailed. Let's hear what others think.
>>
>> Kind regards,
>> Fokko
>>
>> Op ma 17 feb 2025 om 18:16 schreef Robert Stupp <sn...@snazy.de>:
>>
>>> Hi Fokko,
>>>
>>> sure, in general "absent" or "null" would be cleaner. But now we have
>>> two representations for the same case - I suspect most went with the
>>> "reference behavior".
>>>
>>> The thing is, that -1 cannot "go away".
>>>
>>> I'd prefer to keep the previous behavior - otherwise implementations may
>>> fall back to 0, which is definitely wrong. Would be better IMHO not to
>>> break existing implementations / render existing setups incompatible with
>>> Iceberg 1.8.
>>>
>>>
>>> On 17.02.25 15:49, Fokko Driesprong wrote:
>>>
>>> Hey Robert,
>>>
>>> Thanks for raising this.
>>>
>>> snapshot-ID -1 isn't per-se invalid, because the valid values are not
>>>> defined in the spec.
>>>
>>>
>>> For me, this is invalid, since there is no snapshot with -1 in the
>>> snapshots property. In the tests with the PR, you can see that there
>>> are no snapshots
>>> <https://github.com/apache/iceberg/pull/11560/files#diff-41bdfb6698d2aa7b47ff7d5fabc558a5a64f8b7496fe1bcd8f8ecb69b2afc128R112>.
>>> A year ago we had a similar discussion on PyIceberg
>>> <https://py.iceberg.apache.org/configuration/#backward-compatibility>
>>> around this and this ended up in adding a flag to fall back to the old
>>> behavior. I do agree that we should have communicated this more clearly
>>> with the release.
>>>
>>> Kind regards,
>>> Fokko
>>>
>>>
>>>
>>>
>>> Op ma 17 feb 2025 om 12:25 schreef Robert Stupp <sn...@snazy.de>:
>>>
>>>> Feels like https://github.com/apache/iceberg/pull/11560 introduced a
>>>> behavior change.
>>>>
>>>> snapshot-ID -1 isn't per-se invalid, because the valid values are not
>>>> defined in the spec.
>>>>
>>>> Previous Iceberg-Java versions always produced -1 if there's no current
>>>> snapshot - 1.8 produces `null` in that case. So there are now two
>>>> _different_ values for "no current snapshot".
>>>>
>>>> Implementations that rely on the behavior of the "reference
>>>> implementation" (Iceberg-Java) do now fail in case there's no current
>>>> snapshot.
>>>> On 13.02.25 10:09, Amogh Jahagirdar wrote:
>>>>
>>>> I'm pleased to announce the release of Apache Iceberg 1.8.0!
>>>>
>>>> Apache Iceberg is an open table format for huge analytic datasets.
>>>> Iceberg
>>>> delivers high query performance for tables with tens of petabytes of
>>>> data,
>>>> along with atomic commits, concurrent writes, and SQL-compatible table
>>>> evolution.
>>>>
>>>> This release can be downloaded from:
>>>> https://www.apache.org/dyn/closer.cgi/iceberg/apache-iceberg-1.8.0/apache-iceberg-1.8.0.tar.gz
>>>>
>>>>
>>>> Release notes: https://iceberg.apache.org/releases/#180-release
>>>> <https://iceberg.apache.org/releases/180-release>
>>>>
>>>> Java artifacts are available from Maven Central.
>>>>
>>>> Thanks to everyone for contributing!
>>>>
>>>> --
>>>> Robert Stupp
>>>> @snazy
>>>>
>>>> --
>>> Robert Stupp
>>> @snazy
>>>
>>> --
> Robert Stupp
> @snazy
>
> --
> Robert Stupp
> @snazy
>
>

Reply via email to