I think there are two problems here:

1. As Fokko and Eduard mentioned, Glue has produced invalid v1 tables.
2. The java implementation didn't do a sanity check about table metadata
for required fields.

We could follow java implementation in other languages, but this seems a
workaround for dealing with invalid table metadata. I'm not sure if this is
desired approach.


On Tue, Mar 18, 2025 at 5:26 PM Eduard Tudenhöfner <etudenhoef...@apache.org>
wrote:

> I also believe that the Java impl and the spec are correct for V1.
> The TableMetadataParser reads *schemas
> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L356-L380>*
> / *partition_specs
> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L393-L409>*
> first. If those are present, then it must be a V2 table and therefore *schema
> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L382-L387>*
> / *partition_spec
> <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L412-L421>*
>  don't
> have to be read.
> That being said, *schema* / *partition_spec* are still required fields
> for V1, so I wouldn't change anything in the spec other than maybe adding
> that implementation note of the Java impl.
>
> As Fokko already mentioned and looking at
> https://github.com/apache/iceberg-rust/issues/1088 it seems that the
> problem is with how Glue produces V1 metadata.
>
>
> Eduard
>
> On Tue, Mar 18, 2025 at 9:53 AM Fokko Driesprong <fo...@apache.org> wrote:
>
>> Hi everyone,
>>
>> Thanks for raising this. I believe the Java implementation and the spec
>> are still in sync since for V1 we always write schema
>> <https://github.com/apache/iceberg/blob/8f6ebb5b36a0263edfcb04e0c104b26225f95b07/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L176-L182>
>> and partition-spec
>> <https://github.com/apache/iceberg/blob/8f6ebb5b36a0263edfcb04e0c104b26225f95b07/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L192-L196>.
>> According to the PR <https://github.com/apache/iceberg-rust/pull/1087>,
>> it looks like the problem lies in Glue. I believe making the fields in the
>> spec optional instead of required isn't the right approach, since this
>> leads us to a path where we would update the spec for an
>> implementation that is not adhering to it. Since the field is deprecated, I
>> think this is a special case and we could add a section to Implementation
>> Notes <https://iceberg.apache.org/spec/#appendix-f-implementation-notes> that
>> states that the schemas and current-schema-id take priority over schema.
>>
>> Curious to hear what others think.
>>
>> Kind regards,
>> Fokko
>>
>> Op di 18 mrt 2025 om 04:20 schreef Kevin Liu <kevinjq...@apache.org>:
>>
>>> Hi Renjie,
>>>
>>> Thanks for raising this! For context, here's the iceberg-rust PR
>>> <https://github.com/apache/iceberg-rust/pull/1087> where we found the
>>> inconsistency.
>>> I am also in favor of approach 2 for backwards compatibility reasons.
>>> Perhaps, we can include this behavior in "Appendix E: Format version
>>> changes" <https://iceberg.apache.org/spec/#version-2>, under the "Reading
>>> v1 metadata for v2" section.
>>>
>>> Best,
>>> Kevin Liu
>>>
>>>
>>> On Mon, Mar 17, 2025 at 7:17 PM Renjie Liu <liurenjie2...@gmail.com>
>>> wrote:
>>>
>>>> Hi:
>>>>
>>>> We found an inconsistency between java implementation and spec about
>>>> partition-spec and schema in v1 table.
>>>>
>>>> In spec, it says in v1 table partition-spec and schema are required but
>>>> deprecated: https://iceberg.apache.org/spec/#table-metadata
>>>>
>>>> While in java implementation, they are both optional. For schema, it
>>>> checks for `schemas` first, and looks up the current table schema by
>>>> `schema-id`.  Otherwise it looks for a `schema` field. Similar things
>>>> happen for `partition spec`.
>>>>
>>>> We found this problem when implementing iceberg-rust. To resolve this
>>>> inconsistency, there are two approaches:
>>>>
>>>> 1. Modify java implementation to match spec, e.g. force checking
>>>> `schema` and `partition spec` field.
>>>> 2. Update spec to claim that both `schema` and `partition spec` are
>>>> optional for v1 table.
>>>>
>>>> Personally I'm in favor of approach 2 as it keeps backward
>>>> compatibility and seems a more reasonable solution to me.
>>>>
>>>> Looking forward to hearing from you!
>>>>
>>>

Reply via email to