I think there are two problems here: 1. As Fokko and Eduard mentioned, Glue has produced invalid v1 tables. 2. The java implementation didn't do a sanity check about table metadata for required fields.
We could follow java implementation in other languages, but this seems a workaround for dealing with invalid table metadata. I'm not sure if this is desired approach. On Tue, Mar 18, 2025 at 5:26 PM Eduard Tudenhöfner <etudenhoef...@apache.org> wrote: > I also believe that the Java impl and the spec are correct for V1. > The TableMetadataParser reads *schemas > <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L356-L380>* > / *partition_specs > <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L393-L409>* > first. If those are present, then it must be a V2 table and therefore *schema > <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L382-L387>* > / *partition_spec > <https://github.com/apache/iceberg/blob/f3b3ee40871d38083ed095215fffa91acb2c8a45/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L412-L421>* > don't > have to be read. > That being said, *schema* / *partition_spec* are still required fields > for V1, so I wouldn't change anything in the spec other than maybe adding > that implementation note of the Java impl. > > As Fokko already mentioned and looking at > https://github.com/apache/iceberg-rust/issues/1088 it seems that the > problem is with how Glue produces V1 metadata. > > > Eduard > > On Tue, Mar 18, 2025 at 9:53 AM Fokko Driesprong <fo...@apache.org> wrote: > >> Hi everyone, >> >> Thanks for raising this. I believe the Java implementation and the spec >> are still in sync since for V1 we always write schema >> <https://github.com/apache/iceberg/blob/8f6ebb5b36a0263edfcb04e0c104b26225f95b07/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L176-L182> >> and partition-spec >> <https://github.com/apache/iceberg/blob/8f6ebb5b36a0263edfcb04e0c104b26225f95b07/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L192-L196>. >> According to the PR <https://github.com/apache/iceberg-rust/pull/1087>, >> it looks like the problem lies in Glue. I believe making the fields in the >> spec optional instead of required isn't the right approach, since this >> leads us to a path where we would update the spec for an >> implementation that is not adhering to it. Since the field is deprecated, I >> think this is a special case and we could add a section to Implementation >> Notes <https://iceberg.apache.org/spec/#appendix-f-implementation-notes> that >> states that the schemas and current-schema-id take priority over schema. >> >> Curious to hear what others think. >> >> Kind regards, >> Fokko >> >> Op di 18 mrt 2025 om 04:20 schreef Kevin Liu <kevinjq...@apache.org>: >> >>> Hi Renjie, >>> >>> Thanks for raising this! For context, here's the iceberg-rust PR >>> <https://github.com/apache/iceberg-rust/pull/1087> where we found the >>> inconsistency. >>> I am also in favor of approach 2 for backwards compatibility reasons. >>> Perhaps, we can include this behavior in "Appendix E: Format version >>> changes" <https://iceberg.apache.org/spec/#version-2>, under the "Reading >>> v1 metadata for v2" section. >>> >>> Best, >>> Kevin Liu >>> >>> >>> On Mon, Mar 17, 2025 at 7:17 PM Renjie Liu <liurenjie2...@gmail.com> >>> wrote: >>> >>>> Hi: >>>> >>>> We found an inconsistency between java implementation and spec about >>>> partition-spec and schema in v1 table. >>>> >>>> In spec, it says in v1 table partition-spec and schema are required but >>>> deprecated: https://iceberg.apache.org/spec/#table-metadata >>>> >>>> While in java implementation, they are both optional. For schema, it >>>> checks for `schemas` first, and looks up the current table schema by >>>> `schema-id`. Otherwise it looks for a `schema` field. Similar things >>>> happen for `partition spec`. >>>> >>>> We found this problem when implementing iceberg-rust. To resolve this >>>> inconsistency, there are two approaches: >>>> >>>> 1. Modify java implementation to match spec, e.g. force checking >>>> `schema` and `partition spec` field. >>>> 2. Update spec to claim that both `schema` and `partition spec` are >>>> optional for v1 table. >>>> >>>> Personally I'm in favor of approach 2 as it keeps backward >>>> compatibility and seems a more reasonable solution to me. >>>> >>>> Looking forward to hearing from you! >>>> >>>