I'm in favor of this as well. While working on PyIceberg I had to deduce
this from the Java code, having a more condensed version in the appendix of
the spec would be great.

Kind regards,
Fokko

Op ma 16 dec 2024 om 14:21 schreef Jean-Baptiste Onofré <j...@nanthrax.net>:

> Hi,
>
> yes I agree, I don't think we have to couple of spec version.
>
> Regards
> JB
>
> On Wed, Dec 11, 2024 at 11:17 PM Russell Spitzer
> <russell.spit...@gmail.com> wrote:
> >
> > I want to float this back up, I think this is a really good idea for
> cross engine support. I don't think we have to tie this to any specific
> Spec version since they are just recommendations so I think we can do this
> at any time
> >
> > On Wed, Nov 27, 2024 at 1:31 PM Szehon Ho <szehon.apa...@gmail.com>
> wrote:
> >>
> >> This makes sense to me generally, I've tried a few times to search in
> the spec to find a list of possible snapshot summary properties, and was a
> bit surprised to not find them there.  So I think this would be a nice
> addition.
> >>
> >> I'm curious if there's any historical reason it's not been included in
> the spec.
> >>
> >> Thanks
> >> Szehon
> >>
> >> On Wed, Nov 27, 2024 at 10:55 AM Kevin Liu <kevinjq...@apache.org>
> wrote:
> >>>
> >>> Thanks for driving this Honah!
> >>>
> >>> It's important to have a consistent naming scheme so that we don't
> need to worry about edge cases when using multiple engines, and possibly
> have to deal with migrations.
> >>>
> >>> Also, since users can store arbitrary key/value pairs in the summary
> property, it's good to document the currently used properties to avoid
> collision.
> >>>
> >>> I like the proposal to document all properties in a "snapshot summary"
> table, this will ensure a centralized place to view all possible key/value
> pairs, similar to how FileIO configuration is handled in iceberg-python.
> Other implementations can use this table as a reference.
> >>>
> >>>  > This approach offers flexibility, as new fields can be added
> through documentation updates without requiring specification changes.
> >>> This will save a lot of effort since specification changes require
> greater scrutiny.
> >>>
> >>> > summary details would not be located near the Snapshot section,
> which explains the summary field.
> >>> We can link the table to the Snapshot section.
> >>>
> >>>
> >>> Would love to hear others' thoughts on this.
> >>>
> >>> Best,
> >>> Kevin Liu
> >>>
> >>> On Tue, Nov 26, 2024 at 2:50 PM Honah J. <hon...@apache.org> wrote:
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> I’d like to propose an addition to the table specification to
> document optional fields in the snapshot summary.
> >>>>
> >>>> Currently, the snapshot summary includes a required operation field
> and various optional fields. While these optional fields—such as metrics
> and partition-level summaries—are supported by Java and Python
> implementations, they are not officially documented. This creates risks of
> inconsistency as other implementations and engines adopt and interact with
> these fields.
> >>>>
> >>>> I propose adding a new section to the table specification to document
> these optional fields, ensuring consistent naming conventions and reducing
> ambiguity across implementations. While this is the primary proposal, it
> may also be worth discussing whether documenting these fields separately in
> Docs/Table would provide additional flexibility for future updates.
> >>>>
> >>>> I’d love to hear your thoughts, suggestions, or concerns about this
> proposal.
> >>>>
> >>>> Looking forward to the discussion!
> >>>>
> >>>> Links
> >>>>
> >>>> GitHub tracking issue: https://github.com/apache/iceberg/issues/11659
> >>>> Proposal:
> https://docs.google.com/document/d/1Gt1ZOXVXK60IGdlmt4QlyRzaZ1iCVyYUBfMJCsiz14I/edit?usp=sharing
> >>>> PR: https://github.com/apache/iceberg/pull/11660
> >>>>
> >>>>
> >>>> Best regards,
> >>>> Honah
>

Reply via email to