> For example, the `Snapshot` `summary` field is optional in V1 but
required in V2. Therefore, the REST spec definition should mark the
`summary` field as optional to support both versions.

Yeah, this is technically true. But as I said in my first email, unless you
have tables that are 5 years old, it's unlikely that this is going to be a
problem. A failure here is more likely with newer implementations that have
a bug. So I'd argue there's value in leaving it as required.

On Mon, Oct 21, 2024 at 9:41 AM Kevin Liu <kevin.jq....@gmail.com> wrote:

> > No. They were introduced at the same time.
> Great! Since the `summary` field and the `operation` key were introduced
> together, we should enforce the rule that the `summary` field must always
> have an accompanying `operation` key. This has been addressed in PR 11354
> [1].
>
> > I am strongly against this. The REST spec should be independent of the
> table versions.
> That makes sense. For the REST spec to support both V1 and V2 tables, it
> should "accept" the least common denominator between the two versions. For
> example, the `Snapshot` `summary` field is optional in V1 but required in
> V2. Therefore, the REST spec definition should mark the `summary` field as
> optional to support both versions. However, the current REST spec leans
> towards the V2 table spec; fields that are optional in V1 and required in
> V2 are marked as required in the spec, such as `TableMetadata.table-uuid`
> [2][3] and `Snapshot.summary` [4][5].
>
> Would love to get other people's thoughts on this.
>
> Best,
> Kevin Liu
>
> [1] https://github.com/apache/iceberg/pull/11354
> [2]
> https://github.com/apache/iceberg/blob/8e743a5b5209569f84b6bace36e1106c67e1eab3/open-api/rest-catalog-open-api.yaml#L2414
> [3] https://iceberg.apache.org/spec/#table-metadata-fields
> [4]
> https://github.com/apache/iceberg/blob/8e743a5b5209569f84b6bace36e1106c67e1eab3/open-api/rest-catalog-open-api.yaml#L2325
> [5] https://iceberg.apache.org/spec/#snapshots
>
> On Sun, Oct 20, 2024 at 11:24 AM rdb...@gmail.com <rdb...@gmail.com>
> wrote:
>
>> Was it ever valid to have a summary field without the operation key?
>>
>> No. They were introduced at the same time.
>>
>> Would it be helpful to create alternative versions of the REST spec
>> specifically for referencing V1 and V2 tables?
>>
>> I am strongly against this. The REST spec should be independent of the
>> table versions. Any table format version can be passed and the table format
>> should be the canonical reference for what is allowed. We want to avoid
>> cases where there are discrepancies. The table spec is canonical for table
>> metadata, and the REST spec allows passing it.
>>
>> On Sun, Oct 20, 2024 at 11:18 AM Kevin Liu <kevin.jq....@gmail.com>
>> wrote:
>>
>>> Hey folks,
>>>
>>> Thanks, everyone for the discussion, and thanks Ryan for providing the
>>> historical context.
>>> Enforce the `operation` key in Snapshot’s `summary` field
>>>
>>> When serializing the `Snapshot` object from JSON, the Java
>>> implementation does not enforce that the `summary` field must contain an
>>> `operation` key. In the V1 spec, the `summary` field is optional, while in
>>> the V2 spec, it is required. However, in both versions, if a `summary`
>>> field is present, it must include an `operation` key. Any `summary` field
>>> lacking an `operation` key should be considered invalid.
>>>
>>> I’ve addressed this issue in PR 11354 [1] by adding this constraint when
>>> parsing JSON.
>>>
>>> > We initially did not have the snapshot summary or operation. When I
>>> added the summary, the operation was intended to be required in cases where
>>> the summary is present. It should always be there if the summary is and the
>>> summary should always be there unless you wrote the metadata.json file
>>> way back in 2017 or 2018.
>>>
>>> @Ryan, does this constraint also apply to `metadata.json` files from
>>> 2017/2018? Was it ever valid to have a `summary` field without the
>>> `operation` key?
>>>
>>> > Well, the spec says nothing about a top-level `operation` field in
>>> JSON [1]. Yet the Java implementation produces it [2] and removes the
>>> operation from the summary map. This seems inconsistent?
>>>
>>> @Anton, the Java `Snapshot` object includes both the `summary` and
>>> `operation` fields. When serializing to JSON, the `operation` field is
>>> included in the `summary` map [2], rather than as a top-level field. During
>>> deserialization from JSON, the `operation` field is extracted from the
>>> `summary` map [3].
>>>
>>> I believe this is consistent with the table spec, which defines the JSON
>>> output, not how the `Snapshot` object is implemented in Java.
>>> On REST spec and Table spec
>>>
>>> Thanks, Yufei, for highlighting the difference between the REST spec and
>>> the table spec. I mistakenly used the REST spec
>>> (`rest-catalog-open-api.yaml` [4]) as the source of truth for V2 tables.
>>>
>>> Looking at the REST spec file, it can be challenging to determine how a
>>> REST server should handle V1 versus V2 tables. Even for V2 tables, the
>>> current version of the file combines features from V2, along with
>>> additional changes made in preparation for the upcoming V3 spec.
>>>
>>> Would it be helpful to create alternative versions of the REST spec
>>> specifically for referencing V1 and V2 tables? The goal would be to have a
>>> "frozen" version of the REST spec dedicated to V1 tables and another for V2
>>> tables while allowing the current REST spec file to evolve as needed.
>>>
>>> Taking a step back, I think we need more documentation on the REST spec,
>>> including support for different table versions and guidance on upgrading
>>> from one version to another. I’d love to hear everyone’s thoughts on this.
>>>
>>>
>>> Best,
>>>
>>> Kevin Liu
>>>
>>>
>>> [1] https://github.com/apache/iceberg/pull/11354
>>>
>>> [2]
>>> https://github.com/apache/iceberg/blob/17f1c4d2205b59c2bd877d4d31bbbef9e90979c5/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L63-L66
>>>
>>> [3]
>>> https://github.com/apache/iceberg/blob/17f1c4d2205b59c2bd877d4d31bbbef9e90979c5/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L124-L137
>>>
>>> [4]
>>> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml
>>>
>>>
>>> On Sat, Oct 19, 2024 at 7:48 PM Sung Yun <sun...@apache.org> wrote:
>>>
>>>> Hi Ryan, thank you for your response!
>>>>
>>>> That detailed context is very helpful in allowing me to understanding
>>>> why the REST catalog spec has evolved the way it has, and how the Table
>>>> Spec and the REST Catalog Spec should each be referenced in the
>>>> sub-communities (like in PyIceberg). I'll keep those motivations in mind as
>>>> we discuss those Specs in the future.
>>>>
>>>> Also, here's a small PR to specify more explicitly that the operation
>>>> field should be a required field in the summary field:
>>>> https://github.com/apache/iceberg/pull/11355
>>>>
>>>> Sung
>>>>
>>>> On 2024/10/19 22:14:59 "rdb...@gmail.com" wrote:
>>>> > I can provide some historical context here about how the table spec
>>>> evolved
>>>> > and how the REST spec works with respect to table versions.
>>>> >
>>>> > We initially did not have the snapshot summary or operation. When I
>>>> added
>>>> > the summary, the operation was intended to be required in cases where
>>>> the
>>>> > summary is present. It should always be there if the summary is and
>>>> the
>>>> > summary should always be there unless you wrote the metadata.json
>>>> file way
>>>> > back in 2017 or 2018. It looks like the spec could be more clear that
>>>> the
>>>> > operation is required when summary is present. Anyone want to open a
>>>> PR?
>>>> >
>>>> > Anton, I don't think there is a top-level operation field. The Java
>>>> > Snapshot class tracks the operation as top-level, but it is always
>>>> stored
>>>> > in the summary. I think this is consistent with the spec.
>>>> >
>>>> > For the REST spec, I think that it should be strictly optional to
>>>> support
>>>> > v1 tables written with no summary, but it should always be present.
>>>> I'd
>>>> > probably leave it required since it already is and is catching a
>>>> valuable
>>>> > error case here.
>>>> >
>>>> > When building the REST spec, I took the same approach as the Java
>>>> parser
>>>> > (which is also to parse table metadata coming from REST servers).
>>>> That is,
>>>> > it is compatible with v1 metadata; fields that were not required in
>>>> v1 are
>>>> > optional. This fits with the principle of "be liberal in what you
>>>> accept
>>>> > and strict in what you produce". The REST spec allows passing
>>>> metadata for
>>>> > any table version so that the specs are not tightly coupled. The table
>>>> > version is passed when loading and clients should reject table
>>>> versions
>>>> > that are newer than what they can support. The REST protocol just
>>>> needs to
>>>> > facilitate passing the table metadata.
>>>> >
>>>> > Most v2 structures, like the `schemas` list, are introduced as
>>>> optional in
>>>> > v1 and made required in v2. That way, it's possible to add metadata to
>>>> > existing format versions and make the upgrade path easier.
>>>> Maintaining the
>>>> > newer structures even though there are different writer versions
>>>> deployed
>>>> > is one of the reasons why the REST spec changes to a change-based
>>>> model.
>>>> > New metadata only needs to be supported by the service maintaining the
>>>> > metadata.json files and any writers that want to update it.
>>>> >
>>>> > I see some points about being able to remove old table versions. I
>>>> don't
>>>> > think that the REST protocol itself is the place to do this. The
>>>> protocol
>>>> > is format-agnostic. Implementations are free to reject requests to
>>>> create
>>>> > tables with older versions, or to update the table to a new version.
>>>> >
>>>> > Ryan
>>>> >
>>>> > On Fri, Oct 18, 2024 at 6:42 AM Sung Yun <sun...@apache.org> wrote:
>>>> >
>>>> > > Folks, thank you for your responses! These area really helpful
>>>> insights.
>>>> > >
>>>> > > > I agree that the REST spec should aim to support v1, v2, and
>>>> potentially
>>>> > > the upcoming v3. In practice, it seems like the choice of table
>>>> spec might
>>>> > > ultimately be dictated by the REST catalog implementation.
>>>> > >
>>>> > > > A best practice would be for the server to strive to support all
>>>> Iceberg
>>>> > > versions, but the REST spec itself should remain flexible enough to
>>>> > > accommodate less strict table specs.
>>>> > >
>>>> > > Yufei, yes that makes sense, and I agree that the server should
>>>> strive to
>>>> > > support all format versions, because otherwise the an older client
>>>> > > application, may just not be compatible with a REST Catalog running
>>>> on a
>>>> > > higher version of table spec.  I think we have two choices here in
>>>> ensuring
>>>> > > that the REST Catalog server is able to support multiple versions
>>>> of the
>>>> > > Table Spec:
>>>> > >
>>>> > > 1. We could create single components that are common denominators
>>>> of all
>>>> > > existing table specs to accommodate the less table specs. The REST
>>>> Catalog
>>>> > > Spec currently falls short in this approach, and I've put up this
>>>> PR to
>>>> > > show what this change would look like just for the Snapshot
>>>> component:
>>>> > > https://github.com/apache/iceberg/pull/11353 - My take on this is
>>>> that,
>>>> > > this approach will make the REST catalog spec more confusing and
>>>> difficult
>>>> > > to manage as we add more Table Spec versions moving forward. The
>>>> discussion
>>>> > > on this mail list thread is I think a great demonstration of that
>>>> confusion
>>>> > > :)
>>>> > > 2. We could instead create separate Table Spec version specific
>>>> components
>>>> > > on the REST Catalog Open API Spec. For example, a Snapshot
>>>> component could
>>>> > > be anyOf SnapshotV1 and SnapshotV2, which match the Table Spec V1
>>>> and V2
>>>> > > definitions. I think creating explicit components that match the
>>>> spec
>>>> > > definitions will work in our favor when we continue to introduce
>>>> more Spec
>>>> > > changes and manage their lifecycles. And perhaps, maybe we could
>>>> also
>>>> > > indicate what format-versions the REST Catalog Server supports
>>>> through an
>>>> > > endpoint, and communicate it to a client application.
>>>> > >
>>>> > > I'd love to hear the community's opinion on suggestion (2)! I'm very
>>>> > > curious to hear if we've considered it before.
>>>> > >
>>>> > > Sung
>>>> > >
>>>> > > On 2024/10/18 05:13:15 Péter Váry wrote:
>>>> > > > Hi Team,
>>>> > > > Apart from fixing this current issue by relaxing the current spec
>>>> > > > constraints, to support both v1 and v2 specifications, we should
>>>> think
>>>> > > > about how to handle table spec evolution for the long term.
>>>> > > >
>>>> > > > What are the base factors we can start from (please add your own
>>>> ideas
>>>> > > if I
>>>> > > > have missed something):
>>>> > > > - We evolve the specifications in a way that is backwards
>>>> compatible (v1
>>>> > > > table could be read by v2 reader) but not forwards compatible (v2
>>>> table
>>>> > > > could not be read by an old reader)
>>>> > > > - The rest spec ideally should conform to the currently used
>>>> table spec
>>>> > > > schema/constraints
>>>> > > > - REST catalogs sooner-or-later would like to drop support for
>>>> older
>>>> > > table
>>>> > > > spec. We need to avoid the situation of Hive Metastore, where the
>>>> > > decisions
>>>> > > > made 10 years ago prevented enhancing the APIs as the old
>>>> specifications
>>>> > > > were supported forever.
>>>> > > >
>>>> > > > Probably (when the spec difference becomes big enough) a composit
>>>> request
>>>> > > > (version + different content spec) or a different endpoint will be
>>>> > > required.
>>>> > > >
>>>> > > > Thanks, Peter
>>>> > > >
>>>> > > > On Thu, Oct 17, 2024, 23:11 Yufei Gu <flyrain...@gmail.com>
>>>> wrote:
>>>> > > >
>>>> > > > > Hi Sung,
>>>> > > > >
>>>> > > > > It seems we are running to issues related to a mismatch between
>>>> the
>>>> > > REST
>>>> > > > > spec and table specifications. Currently, there's no clear
>>>> definition
>>>> > > of
>>>> > > > > how the REST spec is meant to support different table specs. The
>>>> > > closest
>>>> > > > > reference I found is this statement
>>>> > > > > <
>>>> > >
>>>> https://github.com/apache/iceberg/blob/8e743a5b5209569f84b6bace36e1106c67e1eab3/open-api/rest-catalog-open-api.yaml#L30-L30
>>>> > > >
>>>> > > > > in the REST spec.
>>>> > > > >
>>>> > > > > Implementations should ideally support both Iceberg table specs
>>>> v1 and
>>>> > > v2,
>>>> > > > >> with priority given to v2.
>>>> > > > >
>>>> > > > >
>>>> > > > > I agree that the REST spec should aim to support v1, v2, and
>>>> > > potentially
>>>> > > > > the upcoming v3. In practice, it seems like the choice of table
>>>> spec
>>>> > > might
>>>> > > > > ultimately be dictated by the REST catalog implementation.
>>>> > > > >
>>>> > > > >
>>>> > > > > A best practice would be for the server to strive to support all
>>>> > > Iceberg
>>>> > > > > versions, but the REST spec itself should remain flexible
>>>> enough to
>>>> > > > > accommodate less strict table specs. For the case you
>>>> mentioned, it
>>>> > > should
>>>> > > > > be fine to make sequence number optional since the spec has to
>>>> support
>>>> > > v1
>>>> > > > > table spec. It does feel confusing though.
>>>> > > > >
>>>> > > > >
>>>> > > > > WDYT?
>>>> > > > >
>>>> > > > >
>>>> > > > > Yufei
>>>> > > > >
>>>> > > > >
>>>> > > > > On Thu, Oct 17, 2024 at 1:56 PM Anton Okolnychyi <
>>>> > > aokolnyc...@gmail.com>
>>>> > > > > wrote:
>>>> > > > >
>>>> > > > >> Well, the spec says nothing about a top-level `operation`
>>>> field in
>>>> > > JSON
>>>> > > > >> [1]. Yet the Java implementation produces it [2] and removes
>>>> the
>>>> > > operation
>>>> > > > >> from the summary map. This seems inconsistent?
>>>> > > > >>
>>>> > > > >> - Anton
>>>> > > > >>
>>>> > > > >> [1] - https://iceberg.apache.org/spec/#snapshots
>>>> > > > >> [2] -
>>>> > > > >>
>>>> > >
>>>> https://github.com/apache/iceberg/blob/17f1c4d2205b59c2bd877d4d31bbbef9e90979c5/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L63
>>>> > > > >>
>>>> > > > >>
>>>> > > > >> чт, 17 жовт. 2024 р. о 10:06 Sung Yun <sun...@apache.org>
>>>> пише:
>>>> > > > >>
>>>> > > > >>> > As a side note, the `rest-catalog-open-api.yaml` file [2]
>>>> in the
>>>> > > > >>> Iceberg repo contains the latest version of the spec.
>>>> > > > >>>
>>>> > > > >>> I think more clarity on this would be helpful. Is it really
>>>> the case
>>>> > > > >>> that the Open API spec contains the latest version of the
>>>> spec? For
>>>> > > > >>> example, I'm noticing a discrepancy between sequence-number
>>>> in the
>>>> > > Table
>>>> > > > >>> Spec and in the Open API Spec...
>>>> > > > >>>
>>>> > > > >>> In the table spec, it's required for V2, but it's optional in
>>>> the
>>>> > > REST
>>>> > > > >>> API Spec:
>>>> > > > >>>
>>>> > > > >>>
>>>> > >
>>>> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L2319-L2335
>>>> > > > >>>
>>>> > > > >>> On 2024/10/17 16:58:17 Kevin Liu wrote:
>>>> > > > >>> > > Based on the example metadata, that looks like it is not
>>>> to
>>>> > > spec, so
>>>> > > > >>> it's
>>>> > > > >>> > reasonable that python would reject it.  If the java
>>>> > > implementation is
>>>> > > > >>> > allowing for that, it's likely that we're being too relaxed
>>>> > > (possibly a
>>>> > > > >>> > holdover from v1 parsing).
>>>> > > > >>> > I believe the Java implementation is relaxing the
>>>> constraint. I'll
>>>> > > > >>> create a
>>>> > > > >>> > PR with test cases and the necessary changes.
>>>> > > > >>> >
>>>> > > > >>> > > Do you know what produced the metadata?
>>>> > > > >>> > It was created by Snowflake [1]. After verifying this, I'll
>>>> look
>>>> > > into
>>>> > > > >>> > raising the issue with them.
>>>> > > > >>> >
>>>> > > > >>> > As a side note, the `rest-catalog-open-api.yaml` file [2]
>>>> in the
>>>> > > > >>> Iceberg
>>>> > > > >>> > repo contains the latest version of the spec. As we're
>>>> continuing
>>>> > > to
>>>> > > > >>> evolve
>>>> > > > >>> > to spec for V3, would it be helpful to create a frozen
>>>> version
>>>> > > > >>> representing
>>>> > > > >>> > both the V1 and V2 specs for reference, possibly as a
>>>> separate
>>>> > > file?
>>>> > > > >>> >
>>>> > > > >>> > Best,
>>>> > > > >>> > Kevin Liu
>>>> > > > >>> >
>>>> > > > >>> > [1]
>>>> > > > >>> >
>>>> > > > >>>
>>>> > >
>>>> https://github.com/apache/iceberg-python/issues/1106#issuecomment-2312108455
>>>> > > > >>> > [2]
>>>> > > > >>> >
>>>> > > > >>>
>>>> > >
>>>> https://github.com/apache/iceberg/blob/8e2eb9ac2e33ce4bac8956d4e2f099444d03c0e3/open-api/rest-catalog-open-api.yaml
>>>> > > > >>> >
>>>> > > > >>> > On Thu, Oct 17, 2024 at 9:20 AM Daniel Weeks <
>>>> dwe...@apache.org>
>>>> > > > >>> wrote:
>>>> > > > >>> >
>>>> > > > >>> > > Sung,
>>>> > > > >>> > >
>>>> > > > >>> > > I was thinking of v1, so you're right that manifest-list
>>>> and
>>>> > > summary
>>>> > > > >>> are
>>>> > > > >>> > > required as of v2.  The REST Spec seems to follow the v2
>>>> > > definition,
>>>> > > > >>> so I
>>>> > > > >>> > > think we're somewhat implicitly requiring those fields
>>>> via REST.
>>>> > > > >>> > >
>>>> > > > >>> > > Kevin,
>>>> > > > >>> > >
>>>> > > > >>> > > Based on the example metadata, that looks like it is not
>>>> to
>>>> > > spec, so
>>>> > > > >>> it's
>>>> > > > >>> > > reasonable that python would reject it.  If the java
>>>> > > implementation
>>>> > > > >>> is
>>>> > > > >>> > > allowing for that, it's likely that we're being too
>>>> relaxed
>>>> > > > >>> (possibly a
>>>> > > > >>> > > holdover from v1 parsing).
>>>> > > > >>> > >
>>>> > > > >>> > > Do you know what produced the metadata?
>>>> > > > >>> > >
>>>> > > > >>> > > -Dan
>>>> > > > >>> > >
>>>> > > > >>> > > On Thu, Oct 17, 2024 at 9:02 AM Kevin Liu <
>>>> > > kevin.jq....@gmail.com>
>>>> > > > >>> wrote:
>>>> > > > >>> > >
>>>> > > > >>> > >> Thanks for the additional context.
>>>> > > > >>> > >>
>>>> > > > >>> > >> My understanding is that if a Snapshot has a `summary`
>>>> field, it
>>>> > > > >>> must
>>>> > > > >>> > >> also have a corresponding `operation` key in the summary
>>>> map. Is
>>>> > > > >>> that
>>>> > > > >>> > >> correct? Based on the `SnapshotParser`, this is not
>>>> enforced
>>>> > > [1].
>>>> > > > >>> > >>
>>>> > > > >>> > >> The underlying issue in #1106 [2] is the missing
>>>> `operation`
>>>> > > field
>>>> > > > >>> when
>>>> > > > >>> > >> the `summary` field is present.
>>>> > > > >>> > >> For example,
>>>> > > > >>> > >> ```
>>>> > > > >>> > >>     "summary" : {
>>>> > > > >>> > >>       "manifests-created" : "8",
>>>> > > > >>> > >>       "total-records" : "26508666891",
>>>> > > > >>> > >>       "added-files-size" : "3927895626752",
>>>> > > > >>> > >>       "manifests-kept" : "0",
>>>> > > > >>> > >>       "total-files-size" : "3927895626752",
>>>> > > > >>> > >>       "added-records" : "26508666891",
>>>> > > > >>> > >>       "added-data-files" : "231513",
>>>> > > > >>> > >>       "manifests-replaced" : "0",
>>>> > > > >>> > >>       "total-data-files" : "231513"
>>>> > > > >>> > >>     }
>>>> > > > >>> > >> ```
>>>> > > > >>> > >>
>>>> > > > >>> > >> It could be the case that this particular
>>>> `metadata.json` was
>>>> > > > >>> generated
>>>> > > > >>> > >> not according to the spec.
>>>> > > > >>> > >>
>>>> > > > >>> > >> Best,
>>>> > > > >>> > >> Kevin Liu
>>>> > > > >>> > >>
>>>> > > > >>> > >>
>>>> > > > >>> > >> [1]
>>>> > > > >>> > >>
>>>> > > > >>>
>>>> > >
>>>> https://github.com/apache/iceberg/blob/17f1c4d2205b59c2bd877d4d31bbbef9e90979c5/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L124-L142
>>>> > > > >>> > >> [2] https://github.com/apache/iceberg-python/issues/1106
>>>> > > > >>> > >>
>>>> > > > >>> > >>
>>>> > > > >>> > >> On Thu, Oct 17, 2024 at 8:47 AM Sung Yun <
>>>> sun...@apache.org>
>>>> > > wrote:
>>>> > > > >>> > >>
>>>> > > > >>> > >>> Thank you for the clarification Daniel, and thank you
>>>> Kevin for
>>>> > > > >>> raising
>>>> > > > >>> > >>> this issue!
>>>> > > > >>> > >>>
>>>> > > > >>> > >>> Does that mean that we are creating component schemas
>>>> that are
>>>> > > the
>>>> > > > >>> > >>> superset of the V1 and V2 schemas? And if so, should we
>>>> remove
>>>> > > > >>> summary and
>>>> > > > >>> > >>> manifest-list from the required properties, and add
>>>> manifests
>>>> > > > >>> optional
>>>> > > > >>> > >>> property to the Snapshot schema to support both V1 and
>>>> V2
>>>> > > Summary
>>>> > > > >>> specs?
>>>> > > > >>> > >>> https://iceberg.apache.org/spec/#snapshots
>>>> > > > >>> > >>>
>>>> > > > >>> > >>> Or would creating separate component schemas for V1/V2
>>>> be a
>>>> > > > >>> cleaner way
>>>> > > > >>> > >>> to align the REST spec with the table spec?
>>>> > > > >>> > >>>
>>>> > > > >>> > >>> Sung
>>>> > > > >>> > >>>
>>>> > > > >>> > >>> On 2024/10/17 15:19:23 Daniel Weeks wrote:
>>>> > > > >>> > >>> > I'm not convinced this is incorrect behavior (table
>>>> spec or
>>>> > > > >>> > >>> > implementation), but it does lend to some confusion.
>>>> The
>>>> > > > >>> 'summary'
>>>> > > > >>> > >>> field
>>>> > > > >>> > >>> > is optional, which means that if a summary is not
>>>> provided,
>>>> > > you
>>>> > > > >>> do not
>>>> > > > >>> > >>> have
>>>> > > > >>> > >>> > an associated 'operation' field.  The 'operation'
>>>> field is
>>>> > > only
>>>> > > > >>> > >>> required in
>>>> > > > >>> > >>> > the context of the summary, so it's actually possible
>>>> for the
>>>> > > > >>> > >>> > implementation (i.e. the tests you reference) to not
>>>> have an
>>>> > > > >>> operation.
>>>> > > > >>> > >>> >
>>>> > > > >>> > >>> > I think what is wrong here is that the REST spec
>>>> marked the
>>>> > > > >>> summary as
>>>> > > > >>> > >>> > required
>>>> > > > >>> > >>> > <
>>>> > > > >>> > >>>
>>>> > > > >>>
>>>> > >
>>>> https://github.com/apache/iceberg/blob/8e2eb9ac2e33ce4bac8956d4e2f099444d03c0e3/open-api/rest-catalog-open-api.yaml#L2040
>>>> > > > >>> > >>> >,
>>>> > > > >>> > >>> > which is inconsistent with the table spec.
>>>> > > > >>> > >>> >
>>>> > > > >>> > >>> > On Wed, Oct 16, 2024 at 3:52 PM Anton Okolnychyi <
>>>> > > > >>> > >>> aokolnyc...@gmail.com>
>>>> > > > >>> > >>> > wrote:
>>>> > > > >>> > >>> >
>>>> > > > >>> > >>> > > Based on [1], we never persisted the operation in
>>>> the
>>>> > > summary
>>>> > > > >>> map.
>>>> > > > >>> > >>> > > Instead, we persisted it as a top-level field in
>>>> Java,
>>>> > > which is
>>>> > > > >>> > >>> actually
>>>> > > > >>> > >>> > > NOT what the spec says. Does anyone remember cases
>>>> when the
>>>> > > > >>> > >>> operation was
>>>> > > > >>> > >>> > > unknown? I personally don't.
>>>> > > > >>> > >>> > >
>>>> > > > >>> > >>> > > [1] -
>>>> > > > >>> > >>> > >
>>>> > > > >>> > >>>
>>>> > > > >>>
>>>> > >
>>>> https://github.com/apache/iceberg/blob/17f1c4d2205b59c2bd877d4d31bbbef9e90979c5/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L63
>>>> > > > >>> > >>> > >
>>>> > > > >>> > >>> > >
>>>> > > > >>> > >>> > > ср, 16 жовт. 2024 р. о 12:42 Kevin Liu <
>>>> > > kevin.jq....@gmail.com
>>>> > > > >>> >
>>>> > > > >>> > >>> пише:
>>>> > > > >>> > >>> > >
>>>> > > > >>> > >>> > >> Hey folks,
>>>> > > > >>> > >>> > >>
>>>> > > > >>> > >>> > >> I’ve noticed a discrepancy between the Iceberg
>>>> > > specification
>>>> > > > >>> and
>>>> > > > >>> > >>> the Java
>>>> > > > >>> > >>> > >> implementation regarding the `operation` key in the
>>>> > > `Snapshot`
>>>> > > > >>> > >>> `summary`
>>>> > > > >>> > >>> > >> field.
>>>> > > > >>> > >>> > >>
>>>> > > > >>> > >>> > >> The `Snapshot` object's `summary` dictionary
>>>> includes a
>>>> > > > >>> *required*
>>>> > > > >>> > >>> key
>>>> > > > >>> > >>> > >> named `operation`, as outlined in the spec
>>>> describing
>>>> > > Table
>>>> > > > >>> > >>> Metadata and
>>>> > > > >>> > >>> > >> Snapshots [1] and the generated OpenAPI YAML [2].
>>>> > > However, in
>>>> > > > >>> the
>>>> > > > >>> > >>> Java
>>>> > > > >>> > >>> > >> implementation [3], `operation` is treated as
>>>> optional. In
>>>> > > > >>> > >>> contrast, it
>>>> > > > >>> > >>> > >> remains a required field in the Python
>>>> implementation [4].
>>>> > > > >>> > >>> > >> I also found that Java tests for `SnapshotParser`
>>>> assert
>>>> > > that
>>>> > > > >>> the
>>>> > > > >>> > >>> > >> `operation` field is null. [5]
>>>> > > > >>> > >>> > >>
>>>> > > > >>> > >>> > >> Due to this discrepancy, a user reported [6] that
>>>> the
>>>> > > > >>> > >>> `metadata.json`
>>>> > > > >>> > >>> > >> file generated for an Iceberg table could not be
>>>> read by
>>>> > > > >>> PyIceberg,
>>>> > > > >>> > >>> though
>>>> > > > >>> > >>> > >> it is readable using the Iceberg Java library.
>>>> > > > >>> > >>> > >>
>>>> > > > >>> > >>> > >> How should we proceed from here? Should the Java
>>>> library
>>>> > > > >>> enforce
>>>> > > > >>> > >>> this
>>>> > > > >>> > >>> > >> requirement? Additionally, how should we handle
>>>> existing
>>>> > > > >>> > >>> `metadata.json`
>>>> > > > >>> > >>> > >> files that were generated without this field?
>>>> > > > >>> > >>> > >>
>>>> > > > >>> > >>> > >> Best,
>>>> > > > >>> > >>> > >> Kevin Liu
>>>> > > > >>> > >>> > >>
>>>> > > > >>> > >>> > >> [1]
>>>> > > > >>> https://iceberg.apache.org/spec/#table-metadata-and-snapshots
>>>> > > > >>> > >>> > >> [2]
>>>> > > > >>> > >>> > >>
>>>> > > > >>> > >>>
>>>> > > > >>>
>>>> > >
>>>> https://github.com/apache/iceberg/blob/8e2eb9ac2e33ce4bac8956d4e2f099444d03c0e3/open-api/rest-catalog-open-api.yaml#L2057-L2060
>>>> > > > >>> > >>> > >> [3]
>>>> > > > >>> > >>> > >>
>>>> > > > >>> > >>>
>>>> > > > >>>
>>>> > >
>>>> https://github.com/apache/iceberg/blob/64b36999d7ff716ae2534fb0972fcc10d22a64c2/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L124
>>>> > > > >>> > >>> > >> [4]
>>>> > > > >>> > >>> > >>
>>>> > > > >>> > >>>
>>>> > > > >>>
>>>> > >
>>>> https://github.com/apache/iceberg-python/blob/7cf0c225c3cdb32ac5e390de06b7b0e4fe7de92e/pyiceberg/table/snapshots.py#L182
>>>> > > > >>> > >>> > >> [5]
>>>> > > > >>> > >>> > >>
>>>> > > > >>> > >>>
>>>> > > > >>>
>>>> > >
>>>> https://github.com/apache/iceberg/blob/22a6b19c2e226eacc0aa78c1f2ffbdbb168b13be/core/src/test/java/org/apache/iceberg/TestSnapshotJson.java#L52
>>>> > > > >>> > >>> > >> [6]
>>>> https://github.com/apache/iceberg-python/issues/1106
>>>> > > > >>> > >>> > >>
>>>> > > > >>> > >>> > >>
>>>> > > > >>> > >>> >
>>>> > > > >>> > >>>
>>>> > > > >>> > >>
>>>> > > > >>> >
>>>> > > > >>>
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>

Reply via email to