I had a couple of small fixes that would be great to get into 1.8.1:
https://github.com/apache/iceberg/pull/12305
https://github.com/apache/iceberg/pull/12224

I added those to the GitHub 1.8.1 milestone in case this is possible.

Thanks,
Bryan

> On Feb 18, 2025, at 8:53 AM, Robert Stupp <sn...@snazy.de> wrote:
> 
> 
> On 18.02.25 17:10, Fokko Driesprong wrote:
>>> Reality is that Iceberg did write '-1' into current-snapshot-id (and other 
>>> "non-exist" marker values for schema/spec/sort) instead of omitting the 
>>> field.
>> 
>> Yes, but this is wrong. The spec dictates 
>> <https://iceberg.apache.org/spec/#table-metadata-fields> under 
>> current-snapshot-id: 
>> 
>>> long ID of the current table snapshot; must be the same as the current ID 
>>> of the main branch in refs.
> True, it's wrong. (Or: Iceberg violated its own spec ;) )
> 
>> In the Java 1.7.1 code, we don't see this entry in refs 
>> <https://github.com/apache/iceberg/blob/4a432839233f2343a9eae8255532f911f06358ef/core/src/test/java/org/apache/iceberg/rest/responses/TestLoadTableResponseParser.java#L110-L111>.
>>  My point is, we need to exclude -1 from that as well. My concern is that 
>> this is a slippery slope, and start encoding all kinds of historical bugs 
>> into the spec.
> Quite slippery, yes. But the -1 values are out there in the wild. The 
> (special) meaning of that value needs to be in the spec IMHO.
> 
>>> I don't think there is any reason to change the behavior of writing null / 
>>> missing unless that's really a point of confusion for folks.
>> 
>> This is laid out in the reader/writer requirements 
>> <https://iceberg.apache.org/spec/#writer-requirements>, and should be 
>> updated if it is confusing. 
>> 
>>> Is there a JSON library folks are using which distinguishes between 
>>> missing/null?
>> 
>> Yes, pydantic allows you to control this. If you define a field, and it is 
>> missing, it will throw an error when you don't have a default value defined. 
>> The Java TableMetadataV2ValidMinimalTest 
>> <https://github.com/apache/iceberg/blob/main/core/src/test/resources/TableMetadataV2ValidMinimal.json>
>>  omits current-snapshot-id.
> Jackson as well BTW.
> 
> 
> 
>> How about reverting #11560 <https://github.com/apache/iceberg/pull/11560> 
>> for 1.8.1, and then reinstating this for 2.0.0?
> SGTM!
> 
> 
> 
>> 
>> Kind regards,
>> Fokko
>> 
>> 
>> 
>> 
>> 
>> Op di 18 feb 2025 om 16:55 schreef Russell Spitzer 
>> <russell.spit...@gmail.com <mailto:russell.spit...@gmail.com>>:
>>> The only thing I think I agree with is defining that -1 has meant "no 
>>> current snapshot" in the past and
>>> is equivalent to a missing/nuil (if we have to specify that) .  
>>> 
>>> I don't think there is any reason to change the behavior of writing null / 
>>> missing unless 
>>> that's really a point of confusion for folks. Is there a JSON library folks 
>>> are using
>>>  which distinguishes between missing/null?
>>> 
>>> Would having field:null be the same as missing have avoided the issue 
>>> because it seems like libraries
>>> that only handled non-null would also not handle "missing" 
>>> current-snapshot-id as well?
>>> 
>>> I'd really like to hear from other implementers here since changing all 
>>> this again would be a lot of
>>> work and I thought we had understanding of Optional == Nullable as a valid 
>>> thing in the spec. 
>>> 
>>> On Tue, Feb 18, 2025 at 7:35 AM Robert Stupp <sn...@snazy.de 
>>> <mailto:sn...@snazy.de>> wrote:
>>>> Correcting myself: schema/spec/sort seem to be always present - please 
>>>> ignore that part in my previous email. The valid values for those fields 
>>>> however should be defined.
>>>> 
>>>> On 18.02.25 14:29, Robert Stupp wrote:
>>>>> Reality is that Iceberg did write '-1' into current-snapshot-id (and 
>>>>> other "non-exist" marker values for schema/spec/sort) instead of omitting 
>>>>> the field.
>>>>> 
>>>>> Side note: the table-spec says that these fields are optional, but 
>>>>> nothing about whether it is nullable.
>>>>> 
>>>>> The spec should at least be amended to explicitly define the valid values 
>>>>> (-1 for current-snapshot-id for no snapshot is just a fact now that it's 
>>>>> there). IMO that field being 'null' isn't defined in the spec, but the 
>>>>> absence of the field is.
>>>>> 
>>>>> Proposal for the implementation:
>>>>> 
>>>>> * revert https://github.com/apache/iceberg/pull/11560 in 1.8.1 and on main
>>>>> 
>>>>> Proposal for the spec (for current-snapshot-id):
>>>>> 
>>>>> * Define the valid value ranges for the field
>>>>> * Define that the absence of the field means "no current snapshot"
>>>>> * Define that the value -1 of the field means "no current snapshot"
>>>>> * Define that the value 'null' of the field means "no current snapshot"
>>>>> * Define that new implementations must not write the field if there's no 
>>>>> current snapshot
>>>>> 
>>>>> Proposal for the spec (for schema/spec/sort):
>>>>> * Define the valid value ranges for the fields
>>>>> * Define the "schema/spec/sort not present" values (the fields are 
>>>>> optional for v1 but required for v2+v3).
>>>>> * OR Define that "schema/spec/sort must be absent" if there is no current 
>>>>> schema/spec/sort.
>>>>> 
>>>>> WDYT?
>>>>> 
>>>>> 
>>>>> On 17.02.25 21:07, Russell Spitzer wrote:
>>>>>> It sounds like the argument here is that we should change the Spec for 
>>>>>> V1, V2, and V3 to mark current-snapshot-id
>>>>>> as required. Then we should change all other implementations to follow 
>>>>>> this new standard. I'm not sure that
>>>>>> is a good solution going forwards but I'm not sure of how we can support 
>>>>>> catalogs/engines that cannot handle a null
>>>>>> correctly in this situation otherwise. Perhaps we should source out to 
>>>>>> see if any other implementers worked off the
>>>>>> assumption of a non-optional "current-snapshot-id" and if we get a 
>>>>>> critical mass we can try to make that change? 
>>>>>> Because of how wide that change would be, I think we would need pretty 
>>>>>> broad consensus to do so.
>>>>>> 
>>>>>> We could possibly also have a flag to allow the old behavior but that 
>>>>>> also feels wrong to me, we have often gone with a motto of
>>>>>> read "wrong" write "correct" for things like this in the past and 
>>>>>> continuing to write "wrong" is a disservice to any 
>>>>>> new implementers . When we do have a contradiction between our 
>>>>>> implementation and the spec I think we have 
>>>>>> to trust that implementers followed the spec and fix the core library.
>>>>>> 
>>>>>> Are there any other solutions here?
>>>>>> 
>>>>>> On Mon, Feb 17, 2025 at 11:45 AM Fokko Driesprong <fo...@apache.org 
>>>>>> <mailto:fo...@apache.org>> wrote:
>>>>>>> Hey Robert,
>>>>>>> 
>>>>>>>> The thing is, that -1 cannot "go away".
>>>>>>> 
>>>>>>> Yes, I agree, but that's also the case for null, as the field is 
>>>>>>> optional in the spec 
>>>>>>> <https://iceberg.apache.org/spec/?column-projection#table-metadata-fields>.
>>>>>>>  Therefore we support both in PyIceberg 
>>>>>>> <https://github.com/apache/iceberg-python/blob/300b8405a0fe7d0111321e5644d704026af9266b/pyiceberg/table/metadata.py#L71-L77>,
>>>>>>>  Iceberg-Rust 
>>>>>>> <https://github.com/apache/iceberg-rust/blob/752d69041e0461989c48dd1ca79bcff577776f5d/crates/iceberg/src/spec/table_metadata.rs#L500>,
>>>>>>>  Iceberg-Java 
>>>>>>> <https://github.com/apache/iceberg/blob/bcbbd0344623ffea5b092e2de5debb0bc12892a1/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L458-L462>,
>>>>>>>  and Iceberg-Go 
>>>>>>> <https://github.com/apache/iceberg-go/blob/ada5480954d9b41d2f8eb4c765523614fad65e1a/table/metadata.go#L837-L841>.
>>>>>>>  On the write side, they all produce nullinstead of -1. Therefore, I 
>>>>>>> was surprised that it comes up now, and not earlier.
>>>>>>> 
>>>>>>>> I'd prefer to keep the previous behavior - otherwise implementations 
>>>>>>>> may fall back to 0, which is definitely wrong.
>>>>>>> 
>>>>>>> I'm not seeing why it would fall back to 0, and I agree, that's wrong.
>>>>>>> 
>>>>>>>> Would be better IMHO not to break existing implementations / render 
>>>>>>>> existing setups incompatible with Iceberg 1.8.
>>>>>>> 
>>>>>>> In my opinion, if this had been caught in an RC, it would be open for 
>>>>>>> discussion, but that ship has sailed. Let's hear what others think.
>>>>>>> 
>>>>>>> Kind regards,
>>>>>>> Fokko
>>>>>>> 
>>>>>>> Op ma 17 feb 2025 om 18:16 schreef Robert Stupp <sn...@snazy.de 
>>>>>>> <mailto:sn...@snazy.de>>:
>>>>>>>> Hi Fokko,
>>>>>>>> 
>>>>>>>> sure, in general "absent" or "null" would be cleaner. But now we have 
>>>>>>>> two representations for the same case - I suspect most went with the 
>>>>>>>> "reference behavior".
>>>>>>>> 
>>>>>>>> The thing is, that -1 cannot "go away".
>>>>>>>> 
>>>>>>>> I'd prefer to keep the previous behavior - otherwise implementations 
>>>>>>>> may fall back to 0, which is definitely wrong. Would be better IMHO 
>>>>>>>> not to break existing implementations / render existing setups 
>>>>>>>> incompatible with Iceberg 1.8.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 17.02.25 15:49, Fokko Driesprong wrote:
>>>>>>>>> Hey Robert,
>>>>>>>>> 
>>>>>>>>> Thanks for raising this.
>>>>>>>>> 
>>>>>>>>>> snapshot-ID -1 isn't per-se invalid, because the valid values are 
>>>>>>>>>> not defined in the spec.
>>>>>>>>> 
>>>>>>>>> For me, this is invalid, since there is no snapshot with -1 in the 
>>>>>>>>> snapshots property. In the tests with the PR, you can see that there 
>>>>>>>>> are no snapshots 
>>>>>>>>> <https://github.com/apache/iceberg/pull/11560/files#diff-41bdfb6698d2aa7b47ff7d5fabc558a5a64f8b7496fe1bcd8f8ecb69b2afc128R112>.
>>>>>>>>>  A year ago we had a similar discussion on PyIceberg 
>>>>>>>>> <https://py.iceberg.apache.org/configuration/#backward-compatibility> 
>>>>>>>>> around this and this ended up in adding a flag to fall back to the 
>>>>>>>>> old behavior. I do agree that we should have communicated this more 
>>>>>>>>> clearly with the release.
>>>>>>>>> 
>>>>>>>>> Kind regards,
>>>>>>>>> Fokko
>>>>>>>>>  
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Op ma 17 feb 2025 om 12:25 schreef Robert Stupp <sn...@snazy.de 
>>>>>>>>> <mailto:sn...@snazy.de>>:
>>>>>>>>>> Feels like https://github.com/apache/iceberg/pull/11560 introduced a 
>>>>>>>>>> behavior change.
>>>>>>>>>> 
>>>>>>>>>> snapshot-ID -1 isn't per-se invalid, because the valid values are 
>>>>>>>>>> not defined in the spec.
>>>>>>>>>> 
>>>>>>>>>> Previous Iceberg-Java versions always produced -1 if there's no 
>>>>>>>>>> current snapshot - 1.8 produces `null` in that case. So there are 
>>>>>>>>>> now two _different_ values for "no current snapshot".
>>>>>>>>>> 
>>>>>>>>>> Implementations that rely on the behavior of the "reference 
>>>>>>>>>> implementation" (Iceberg-Java) do now fail in case there's no 
>>>>>>>>>> current snapshot.
>>>>>>>>>> 
>>>>>>>>>> On 13.02.25 10:09, Amogh Jahagirdar wrote:
>>>>>>>>>>> I'm pleased to announce the release of Apache Iceberg 1.8.0!
>>>>>>>>>>> 
>>>>>>>>>>> Apache Iceberg is an open table format for huge analytic datasets. 
>>>>>>>>>>> Iceberg
>>>>>>>>>>> delivers high query performance for tables with tens of petabytes 
>>>>>>>>>>> of data,
>>>>>>>>>>> along with atomic commits, concurrent writes, and SQL-compatible 
>>>>>>>>>>> table
>>>>>>>>>>> evolution.
>>>>>>>>>>> 
>>>>>>>>>>> This release can be downloaded from: 
>>>>>>>>>>> https://www.apache.org/dyn/closer.cgi/iceberg/apache-iceberg-1.8.0/apache-iceberg-1.8.0.tar.gz
>>>>>>>>>>>  
>>>>>>>>>>> 
>>>>>>>>>>> Release notes: https://iceberg.apache.org/releases/#180-release 
>>>>>>>>>>> <https://iceberg.apache.org/releases/180-release>
>>>>>>>>>>> 
>>>>>>>>>>> Java artifacts are available from Maven Central.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks to everyone for contributing!
>>>>>>>>>> -- 
>>>>>>>>>> Robert Stupp
>>>>>>>>>> @snazy
>>>>>>>> -- 
>>>>>>>> Robert Stupp
>>>>>>>> @snazy
>>>>> -- 
>>>>> Robert Stupp
>>>>> @snazy
>>>> -- 
>>>> Robert Stupp
>>>> @snazy
> -- 
> Robert Stupp
> @snazy

Reply via email to