Hi all,
The Iceberg spec specifies that non-zone timestamp and timestamp_ns Iceberg
types should be stored in Avro using timestamp-{micros,nanos} plus the
Iceberg-private property adjust-to-utc=false.
This is non-conformant with the Avro spec, which reserves timestamp-* for
instants on the global timeline and provides local-timestamp-* for the
wall-clock case.
The mismatch causes cross-engine bugs (#12751): Spark, Avro tooling, and other
conformant Avro readers ignore adjust-to-utc and read Iceberg's "non-zone" data
as TZ-adjusted, shifting values silently.
PR # 16577 proposes updating the Avro mapping in format/spec.md to use
local-timestamp-{micros,nanos} for the two non-zone variants, while keeping the
zone-adjusted variants and the existing reader-side back-compat for legacy
files.
Open question for the community: is this the right level of change, or should
we
(a) deprecate adjust-to-utc and require new readers to accept both encodings
indefinitely, or
(b) gate the writer behaviour on a table property until all client
implementations are updated?
Implementation status across known clients:
Ticket: https://github.com/apache/iceberg/issues/12751
• Java: PR ( https://github.com/apache/iceberg/pull/16577 ) - We can split
this PR to phasewise changes but it is rough idea what we would like to change.
• Python: TBD — needs confirmation that pyiceberg.avro recognises
local-timestamp-*.
• Rust: TBD — same.
• Go, C++: not yet on V3 timestamp_ns, likely safer.
• Spark (from_avro/to_avro): already correct — would just-work after the
writer change.
Discussion welcome.
Regards,
Shekhar Rajak