Hi everyone,

I’d like to get feedback on two small spec clarification PRs that update
the schema JSON type string serialization table:

* https://github.com/apache/iceberg/pull/16798
  Clarifies that the canonical schema JSON decimal type string is
`decimal(P, S)`, matching current writer output, and notes that readers
should accept optional whitespace for compatibility with non-canonical type
strings such as `decimal(9,2)`.

* https://github.com/apache/iceberg/pull/16799
  Clarifies that the canonical schema JSON geography type string is
`geography(C, A)`, including the space after the comma and the parameter
name `A`, matching current writer output.

The motivation came from https://github.com/apache/iceberg-rust/issues/2534.
iceberg-rust was writing decimal types as `decimal(P,S)` (without space),
while Java and Python write `decimal(P, S)` (with space). That exposed
ambiguity in the spec because strict downstream parsers may accept only one
form. The intended behavior is that writers produce the canonical form,
while readers remain compatible with existing metadata by accepting both
spacing variants.

These are intended as spec clarifications only. The goal is to document the
canonical serialized form produced by implementations while preserving
reader-side compatibility for existing metadata.

One point I’d like explicit feedback on is the reader compatibility wording
in #16798. The PR currently uses “should” for accepting optional
whitespace. I think this should use “should” rather than “must”. “Must”
would make accepting optional whitespace a hard conformance rule, while
this is intended as reader-side compatibility for non-canonical type
strings.

Would love to hear your thoughts!

Thanks,
Kevin

Reply via email to