Hi all,
The Iceberg spec specifies that non-zone timestamp and timestamp_ns Iceberg 
types should be stored in Avro using timestamp-{micros,nanos} plus the 
Iceberg-private property adjust-to-utc=false. 
This is non-conformant with the Avro spec, which reserves timestamp-* for 
instants on the global timeline and provides local-timestamp-* for the 
wall-clock case. 
The mismatch causes cross-engine bugs (#12751): Spark, Avro tooling, and other 
conformant Avro readers ignore adjust-to-utc and read Iceberg's "non-zone" data 
as TZ-adjusted, shifting values silently.

PR # 16577 proposes updating the Avro mapping in format/spec.md to use 
local-timestamp-{micros,nanos} for the two non-zone variants, while keeping the 
zone-adjusted variants and the existing reader-side back-compat for legacy 
files.
Open question for the community: is this the right level of change, or should 
we 
 (a) deprecate adjust-to-utc and require new readers to accept both encodings 
indefinitely, or
 (b) gate the writer behaviour on a table property until all client 
implementations are updated?
Implementation status across known clients:
Ticket: https://github.com/apache/iceberg/issues/12751 



•  Java: PR ( https://github.com/apache/iceberg/pull/16577  ) - We can split 
this PR to phasewise changes but it is rough idea what we would like to change.
•  Python: TBD — needs confirmation that pyiceberg.avro recognises 
local-timestamp-*.
•  Rust: TBD — same.
•  Go, C++: not yet on V3 timestamp_ns, likely safer.
•  Spark (from_avro/to_avro): already correct — would just-work after the 
writer change.
Discussion welcome.


Regards,
Shekhar Rajak

Reply via email to