HuangZhenQiu commented on PR #23511: URL: https://github.com/apache/flink/pull/23511#issuecomment-1763633001
> Thank you @HuangZhenQiu a lot for contributing this. > > After reading the [Avro spec](https://avro.apache.org/docs/1.11.0/spec.html), I think we have wrongly mapped the Avro timestamp. > > Avro spec says: > > > Timestamp (millisecond precision) > > The timestamp-millis logical type represents an instant on the global timeline, independent of a particular time zone or calendar, with a precision of one millisecond. Please note that time zone information gets lost in this process. Upon reading a value back, we can only reconstruct the instant, but not the original representation. In practice, such timestamps are typically displayed to users in their local time zones, therefore they may be displayed differently depending on the execution environment. > > A timestamp-millis logical type annotates an Avro long, where the long stores the number of milliseconds from the unix epoch, 1 January 1970 00:00:00.000 UTC. > > [Consistent timestamp types in Hadoop SQL engines](https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit) also pointed out: > > > Timestamps in Avro, Parquet and RCFiles with a binary SerDe have Instant semantics > > So Avro Timestamp is a Java Instant semantic that should map to Flink TIMESTAMP_LTZ, but currently, it maps to TIMESTAMP_NTZ. > > On the contrary, > > > Local timestamp (millisecond precision) > > The local-timestamp-millis logical type represents a timestamp in a local timezone, regardless of what specific time zone is considered local, with a precision of one millisecond. > > A local-timestamp-millis logical type annotates an Avro long, where the long stores the number of milliseconds, from 1 January 1970 00:00:00.000. > > Avro LocalTimestamp is a Java LocalDateTime semantic that should map to Flink TIMESTAMP_NTZ. > > If we agree with this behavior, we may need to open a discussion in the dev ML about how to correct the behavior in a backward-compatible or incompatible way. @wuchong Thanks for the feedback according to the hadoop alignment doc. Beside this, I also feel unclear on how to converting timestamp data to TimestampData which is the RowData internal representation. A Flink user can define a dynamic table with Avro format on a timestamp field with a target timestamp with time zone, but we we can't convert the Avro long typed data to the target timestamp with time zone as the target Flink type is missing in Converters. I would like to open a discussion in dev ML after our offline sync. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
