I'm in favor of this as well. While working on PyIceberg I had to deduce this from the Java code, having a more condensed version in the appendix of the spec would be great.
Kind regards, Fokko Op ma 16 dec 2024 om 14:21 schreef Jean-Baptiste Onofré <j...@nanthrax.net>: > Hi, > > yes I agree, I don't think we have to couple of spec version. > > Regards > JB > > On Wed, Dec 11, 2024 at 11:17 PM Russell Spitzer > <russell.spit...@gmail.com> wrote: > > > > I want to float this back up, I think this is a really good idea for > cross engine support. I don't think we have to tie this to any specific > Spec version since they are just recommendations so I think we can do this > at any time > > > > On Wed, Nov 27, 2024 at 1:31 PM Szehon Ho <szehon.apa...@gmail.com> > wrote: > >> > >> This makes sense to me generally, I've tried a few times to search in > the spec to find a list of possible snapshot summary properties, and was a > bit surprised to not find them there. So I think this would be a nice > addition. > >> > >> I'm curious if there's any historical reason it's not been included in > the spec. > >> > >> Thanks > >> Szehon > >> > >> On Wed, Nov 27, 2024 at 10:55 AM Kevin Liu <kevinjq...@apache.org> > wrote: > >>> > >>> Thanks for driving this Honah! > >>> > >>> It's important to have a consistent naming scheme so that we don't > need to worry about edge cases when using multiple engines, and possibly > have to deal with migrations. > >>> > >>> Also, since users can store arbitrary key/value pairs in the summary > property, it's good to document the currently used properties to avoid > collision. > >>> > >>> I like the proposal to document all properties in a "snapshot summary" > table, this will ensure a centralized place to view all possible key/value > pairs, similar to how FileIO configuration is handled in iceberg-python. > Other implementations can use this table as a reference. > >>> > >>> > This approach offers flexibility, as new fields can be added > through documentation updates without requiring specification changes. > >>> This will save a lot of effort since specification changes require > greater scrutiny. > >>> > >>> > summary details would not be located near the Snapshot section, > which explains the summary field. > >>> We can link the table to the Snapshot section. > >>> > >>> > >>> Would love to hear others' thoughts on this. > >>> > >>> Best, > >>> Kevin Liu > >>> > >>> On Tue, Nov 26, 2024 at 2:50 PM Honah J. <hon...@apache.org> wrote: > >>>> > >>>> Hi everyone, > >>>> > >>>> I’d like to propose an addition to the table specification to > document optional fields in the snapshot summary. > >>>> > >>>> Currently, the snapshot summary includes a required operation field > and various optional fields. While these optional fields—such as metrics > and partition-level summaries—are supported by Java and Python > implementations, they are not officially documented. This creates risks of > inconsistency as other implementations and engines adopt and interact with > these fields. > >>>> > >>>> I propose adding a new section to the table specification to document > these optional fields, ensuring consistent naming conventions and reducing > ambiguity across implementations. While this is the primary proposal, it > may also be worth discussing whether documenting these fields separately in > Docs/Table would provide additional flexibility for future updates. > >>>> > >>>> I’d love to hear your thoughts, suggestions, or concerns about this > proposal. > >>>> > >>>> Looking forward to the discussion! > >>>> > >>>> Links > >>>> > >>>> GitHub tracking issue: https://github.com/apache/iceberg/issues/11659 > >>>> Proposal: > https://docs.google.com/document/d/1Gt1ZOXVXK60IGdlmt4QlyRzaZ1iCVyYUBfMJCsiz14I/edit?usp=sharing > >>>> PR: https://github.com/apache/iceberg/pull/11660 > >>>> > >>>> > >>>> Best regards, > >>>> Honah >