zjjiang created FLINK-38024: ------------------------------- Summary: Unify the timestamp types to use TimestampData with time zone information. Key: FLINK-38024 URL: https://issues.apache.org/jira/browse/FLINK-38024 Project: Flink Issue Type: Improvement Components: Flink CDC Affects Versions: cdc-3.4.0, cdc-3.5.0 Reporter: zjjiang
In the current FlinkCDC implementation of timestamp types, the definition of TIMESTAMP and TIMESTAMP_LTZ types and the corresponding internal cdc implementation classes have some semantic confusion and practical use deviation, which brings difficulties to users' understanding, development, debugging and maintenance. Meanwhile, the lack of time zone information leads to the possibility of time offset problem when cross-synchronizing these two types, as described in [FLINK-36806|https://issues.apache.org/jira/browse/FLINK-36806]. *1. Semantic deviation of TimestampData class* * Semantics: `TimestampData` was originally designed to represent a “timestamp of UTC+0”. * Practical use: In code, it is widely used as the carrier structure of TIMESTAMP WITHOUT TIME ZONE, which is supposed to represent local time, i.e., without the semantics of time zone, and is not equivalent to UTC. * Confusion points: ** Users could mistakenly believe that TimestampData represents UTC time; ** What is actually stored is a time stamp with no time zone (e.g. “2025-06-27 10:00:00”), not UTC; ** Ambiguity may occur during time zone conversion or cross-system synchronization (e.g. Kafka -> Iceberg). *2. LocalZonedTimestampData class usage bias* * Semantics: theoretically, it should represent an arbitrary epoch timestamp, i.e., the local time interpreted according to the specified time zone. * Practical use: actually carries a value of type TIMESTAMP_LTZ, which represents a timestamp that has been converted to UTC. * Confusion points: ** LocalZoned in the name can be easily misinterpreted as “local time with time zone”, but it is already UTC in reality; ** There is semantic inconsistency between the class and the use of TIMESTAMP_LTZ in the FlinkCDC type system; ** Type mismatch is easy to be generated when interacting with Flink Planner or Table API. There are obvious inconsistencies and confusion in both type design and actual semantics. In order to avoid the problems of time offset and cross-system ambiguity, it is necessary to unify all timestamp types into expressions that contain time zone information. -- This message was sent by Atlassian Jira (v8.20.10#820010)