I'm generally in support of this as well, but I think we should put this in
an appendix as opposed to the main body of the spec.

-Dan

On Wed, Dec 11, 2024 at 2:18 PM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> I want to float this back up, I think this is a really good idea for cross
> engine support. I don't think we have to tie this to any specific Spec
> version since they are just recommendations so I think we can do this at
> any time
>
> On Wed, Nov 27, 2024 at 1:31 PM Szehon Ho <szehon.apa...@gmail.com> wrote:
>
>> This makes sense to me generally, I've tried a few times to search in the
>> spec to find a list of possible snapshot summary properties, and was a bit
>> surprised to not find them there.  So I think this would be a nice addition.
>>
>> I'm curious if there's any historical reason it's not been included in
>> the spec.
>>
>> Thanks
>> Szehon
>>
>> On Wed, Nov 27, 2024 at 10:55 AM Kevin Liu <kevinjq...@apache.org> wrote:
>>
>>> Thanks for driving this Honah!
>>>
>>> It's important to have a consistent naming scheme so that we don't need
>>> to worry about edge cases when using multiple engines, and possibly have to
>>> deal with migrations.
>>>
>>> Also, since users can store arbitrary key/value pairs in the summary
>>> property, it's good to document the currently used properties to avoid
>>> collision.
>>>
>>> I like the proposal to document all properties in a "snapshot summary"
>>> table, this will ensure a centralized place to view all possible key/value
>>> pairs, similar to how FileIO configuration is handled in iceberg-python
>>> <https://py.iceberg.apache.org/configuration/#s3>. Other
>>> implementations can use this table as a reference.
>>>
>>>  > This approach offers flexibility, as new fields can be added through
>>> documentation updates without requiring specification changes.
>>> This will save a lot of effort since specification changes require
>>> greater scrutiny.
>>>
>>> > summary details would not be located near the Snapshot section, which
>>> explains the summary field.
>>> We can link the table to the Snapshot section.
>>>
>>>
>>> Would love to hear others' thoughts on this.
>>>
>>> Best,
>>> Kevin Liu
>>>
>>> On Tue, Nov 26, 2024 at 2:50 PM Honah J. <hon...@apache.org> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I’d like to propose an addition to the table specification to document
>>>> optional fields in the snapshot summary.
>>>>
>>>> Currently, the snapshot summary includes a required operation field and
>>>> various optional fields. While these optional fields—such as metrics and
>>>> partition-level summaries—are supported by Java
>>>> <https://github.com/apache/iceberg/blob/549674b3fc0cdb18d6cad3e2d6320236fba8c562/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L32-L64>
>>>> and Python
>>>> <https://github.com/HonahX/iceberg-python/blob/45d611fe351f6f3847bf329aa053d890d810e2b6/pyiceberg/table/snapshots.py#L36-L60>
>>>> implementations, they are not officially documented. This creates risks of
>>>> inconsistency as other implementations and engines adopt and interact with
>>>> these fields.
>>>>
>>>> I propose adding a new section to the table specification to document
>>>> these optional fields, ensuring consistent naming conventions and reducing
>>>> ambiguity across implementations. While this is the primary proposal, it
>>>> may also be worth discussing whether documenting these fields separately in
>>>> Docs/Table would provide additional flexibility for future updates.
>>>>
>>>> I’d love to hear your thoughts, suggestions, or concerns about this
>>>> proposal.
>>>>
>>>> Looking forward to the discussion!
>>>>
>>>> Links
>>>>
>>>>    - GitHub tracking issue:
>>>>    https://github.com/apache/iceberg/issues/11659
>>>>    - Proposal:
>>>>    
>>>> https://docs.google.com/document/d/1Gt1ZOXVXK60IGdlmt4QlyRzaZ1iCVyYUBfMJCsiz14I/edit?usp=sharing
>>>>    - PR: https://github.com/apache/iceberg/pull/11660
>>>>
>>>>
>>>> Best regards,
>>>> Honah
>>>>
>>>

Reply via email to