I'm generally in support of this as well, but I think we should put this in an appendix as opposed to the main body of the spec.
-Dan On Wed, Dec 11, 2024 at 2:18 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > I want to float this back up, I think this is a really good idea for cross > engine support. I don't think we have to tie this to any specific Spec > version since they are just recommendations so I think we can do this at > any time > > On Wed, Nov 27, 2024 at 1:31 PM Szehon Ho <szehon.apa...@gmail.com> wrote: > >> This makes sense to me generally, I've tried a few times to search in the >> spec to find a list of possible snapshot summary properties, and was a bit >> surprised to not find them there. So I think this would be a nice addition. >> >> I'm curious if there's any historical reason it's not been included in >> the spec. >> >> Thanks >> Szehon >> >> On Wed, Nov 27, 2024 at 10:55 AM Kevin Liu <kevinjq...@apache.org> wrote: >> >>> Thanks for driving this Honah! >>> >>> It's important to have a consistent naming scheme so that we don't need >>> to worry about edge cases when using multiple engines, and possibly have to >>> deal with migrations. >>> >>> Also, since users can store arbitrary key/value pairs in the summary >>> property, it's good to document the currently used properties to avoid >>> collision. >>> >>> I like the proposal to document all properties in a "snapshot summary" >>> table, this will ensure a centralized place to view all possible key/value >>> pairs, similar to how FileIO configuration is handled in iceberg-python >>> <https://py.iceberg.apache.org/configuration/#s3>. Other >>> implementations can use this table as a reference. >>> >>> > This approach offers flexibility, as new fields can be added through >>> documentation updates without requiring specification changes. >>> This will save a lot of effort since specification changes require >>> greater scrutiny. >>> >>> > summary details would not be located near the Snapshot section, which >>> explains the summary field. >>> We can link the table to the Snapshot section. >>> >>> >>> Would love to hear others' thoughts on this. >>> >>> Best, >>> Kevin Liu >>> >>> On Tue, Nov 26, 2024 at 2:50 PM Honah J. <hon...@apache.org> wrote: >>> >>>> Hi everyone, >>>> >>>> I’d like to propose an addition to the table specification to document >>>> optional fields in the snapshot summary. >>>> >>>> Currently, the snapshot summary includes a required operation field and >>>> various optional fields. While these optional fields—such as metrics and >>>> partition-level summaries—are supported by Java >>>> <https://github.com/apache/iceberg/blob/549674b3fc0cdb18d6cad3e2d6320236fba8c562/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L32-L64> >>>> and Python >>>> <https://github.com/HonahX/iceberg-python/blob/45d611fe351f6f3847bf329aa053d890d810e2b6/pyiceberg/table/snapshots.py#L36-L60> >>>> implementations, they are not officially documented. This creates risks of >>>> inconsistency as other implementations and engines adopt and interact with >>>> these fields. >>>> >>>> I propose adding a new section to the table specification to document >>>> these optional fields, ensuring consistent naming conventions and reducing >>>> ambiguity across implementations. While this is the primary proposal, it >>>> may also be worth discussing whether documenting these fields separately in >>>> Docs/Table would provide additional flexibility for future updates. >>>> >>>> I’d love to hear your thoughts, suggestions, or concerns about this >>>> proposal. >>>> >>>> Looking forward to the discussion! >>>> >>>> Links >>>> >>>> - GitHub tracking issue: >>>> https://github.com/apache/iceberg/issues/11659 >>>> - Proposal: >>>> >>>> https://docs.google.com/document/d/1Gt1ZOXVXK60IGdlmt4QlyRzaZ1iCVyYUBfMJCsiz14I/edit?usp=sharing >>>> - PR: https://github.com/apache/iceberg/pull/11660 >>>> >>>> >>>> Best regards, >>>> Honah >>>> >>>