I agree with Fokko to add the current behavior to the Implementation Notes
section. With schemas vs schema and partitions_specs vs partition_spec, it
is the logical fallback when the required field for the V1 table is
missing.

> Read liberally and write strictly.

This has been mentioned a couple times on the devlist. It's the
general approach the java (reference) implementation has taken. I'm in
favor of following this approach in other implementations. In order to fix
the invalid table metadata, we'd need to be able to read it first.

Best,
Kevin Liu

On Tue, Mar 18, 2025 at 7:02 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> Our general policy in the past has been to accept potentially invalid
> metadata as long as we can reason about it or until we get to a point where
> we have to throw an error.
>
> Read liberally and write strictly.
>
> I wouldn’t want to change the spec to match an incorrect writer but there
> is nothing wrong with changing a reader imho if it’s primarily to fix
> tables. I think the only thing we really want to avoid is making implicit
> changes to the spec by normalizing completely out of spec metadata.
>
> On Tue, Mar 18, 2025 at 4:46 AM Renjie Liu <liurenjie2...@gmail.com>
> wrote:
>
>> I think there are two problems here:
>>
>> 1. As Fokko and Eduard mentioned, Glue has produced invalid v1 tables.
>> 2. The java implementation didn't do a sanity check about table metadata
>> for required fields.
>>
>> We could follow java implementation in other languages, but this seems a
>> workaround for dealing with invalid table metadata. I'm not sure if this is
>> desired approach.
>>
>>
>> On Tue, Mar 18, 2025 at 5:26 PM Eduard Tudenhöfner <
>> etudenhoef...@apache.org> wrote:
>>
>>> I also believe that the Java impl and the spec are correct for V1.
>>> The TableMetadataParser reads *schemas
>>> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L356-L380>*
>>> / *partition_specs
>>> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L393-L409>*
>>> first. If those are present, then it must be a V2 table and therefore 
>>> *schema
>>> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L382-L387>*
>>> / *partition_spec
>>> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L412-L421>*
>>>  don't
>>> have to be read.
>>> That being said, *schema* / *partition_spec* are still required fields
>>> for V1, so I wouldn't change anything in the spec other than maybe adding
>>> that implementation note of the Java impl.
>>>
>>> As Fokko already mentioned and looking at
>>> https://github.com/apache/iceberg-rust/issues/1088 it seems that the
>>> problem is with how Glue produces V1 metadata.
>>>
>>>
>>> Eduard
>>>
>>> On Tue, Mar 18, 2025 at 9:53 AM Fokko Driesprong <fo...@apache.org>
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Thanks for raising this. I believe the Java implementation and the spec
>>>> are still in sync since for V1 we always write schema
>>>> <https://github.com/apache/iceberg/blob/8f6ebb5b36a0263edfcb04e0c104b26225f95b07/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L176-L182>
>>>> and partition-spec
>>>> <https://github.com/apache/iceberg/blob/8f6ebb5b36a0263edfcb04e0c104b26225f95b07/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L192-L196>.
>>>> According to the PR <https://github.com/apache/iceberg-rust/pull/1087>,
>>>> it looks like the problem lies in Glue. I believe making the fields in the
>>>> spec optional instead of required isn't the right approach, since this
>>>> leads us to a path where we would update the spec for an
>>>> implementation that is not adhering to it. Since the field is deprecated, I
>>>> think this is a special case and we could add a section to Implementation
>>>> Notes
>>>> <https://iceberg.apache.org/spec/#appendix-f-implementation-notes> that
>>>> states that the schemas and current-schema-id take priority over schema
>>>> .
>>>>
>>>> Curious to hear what others think.
>>>>
>>>> Kind regards,
>>>> Fokko
>>>>
>>>> Op di 18 mrt 2025 om 04:20 schreef Kevin Liu <kevinjq...@apache.org>:
>>>>
>>>>> Hi Renjie,
>>>>>
>>>>> Thanks for raising this! For context, here's the iceberg-rust PR
>>>>> <https://github.com/apache/iceberg-rust/pull/1087> where we found the
>>>>> inconsistency.
>>>>> I am also in favor of approach 2 for backwards compatibility reasons.
>>>>> Perhaps, we can include this behavior in "Appendix E: Format version
>>>>> changes" <https://iceberg.apache.org/spec/#version-2>, under the "Reading
>>>>> v1 metadata for v2" section.
>>>>>
>>>>> Best,
>>>>> Kevin Liu
>>>>>
>>>>>
>>>>> On Mon, Mar 17, 2025 at 7:17 PM Renjie Liu <liurenjie2...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi:
>>>>>>
>>>>>> We found an inconsistency between java implementation and spec about
>>>>>> partition-spec and schema in v1 table.
>>>>>>
>>>>>> In spec, it says in v1 table partition-spec and schema are required
>>>>>> but deprecated: https://iceberg.apache.org/spec/#table-metadata
>>>>>>
>>>>>> While in java implementation, they are both optional. For schema, it
>>>>>> checks for `schemas` first, and looks up the current table schema by
>>>>>> `schema-id`.  Otherwise it looks for a `schema` field. Similar things
>>>>>> happen for `partition spec`.
>>>>>>
>>>>>> We found this problem when implementing iceberg-rust. To resolve this
>>>>>> inconsistency, there are two approaches:
>>>>>>
>>>>>> 1. Modify java implementation to match spec, e.g. force checking
>>>>>> `schema` and `partition spec` field.
>>>>>> 2. Update spec to claim that both `schema` and `partition spec` are
>>>>>> optional for v1 table.
>>>>>>
>>>>>> Personally I'm in favor of approach 2 as it keeps backward
>>>>>> compatibility and seems a more reasonable solution to me.
>>>>>>
>>>>>> Looking forward to hearing from you!
>>>>>>
>>>>>

Reply via email to