zjjiang created FLINK-38024:
-------------------------------

             Summary: Unify the timestamp types to use TimestampData with time 
zone information.
                 Key: FLINK-38024
                 URL: https://issues.apache.org/jira/browse/FLINK-38024
             Project: Flink
          Issue Type: Improvement
          Components: Flink CDC
    Affects Versions: cdc-3.4.0, cdc-3.5.0
            Reporter: zjjiang


In the current FlinkCDC implementation of timestamp types, the definition of 
TIMESTAMP and TIMESTAMP_LTZ types and the corresponding internal cdc 
implementation classes have some semantic confusion and practical use 
deviation, which brings difficulties to users' understanding, development, 
debugging and maintenance.

Meanwhile, the lack of time zone information leads to the possibility of time 
offset problem when cross-synchronizing these two types, as described in 
[FLINK-36806|https://issues.apache.org/jira/browse/FLINK-36806].

*1. Semantic deviation of TimestampData class*
 * Semantics: `TimestampData` was originally designed to represent a “timestamp 
of UTC+0”.
 * Practical use: In code, it is widely used as the carrier structure of 
TIMESTAMP WITHOUT TIME ZONE, which is supposed to represent local time, i.e., 
without the semantics of time zone, and is not equivalent to UTC.
 * Confusion points:
 ** Users could mistakenly believe that TimestampData represents UTC time;
 ** What is actually stored is a time stamp with no time zone (e.g. “2025-06-27 
10:00:00”), not UTC;
 ** Ambiguity may occur during time zone conversion or cross-system 
synchronization (e.g. Kafka -> Iceberg).

*2. LocalZonedTimestampData class usage bias*
 * Semantics: theoretically, it should represent an arbitrary epoch timestamp, 
i.e., the local time interpreted according to the specified time zone.
 * Practical use: actually carries a value of type TIMESTAMP_LTZ, which 
represents a timestamp that has been converted to UTC.
 * Confusion points:
 ** LocalZoned in the name can be easily misinterpreted as “local time with 
time zone”, but it is already UTC in reality;
 ** There is semantic inconsistency between the class and the use of 
TIMESTAMP_LTZ in the FlinkCDC type system;
 ** Type mismatch is easy to be generated when interacting with Flink Planner 
or Table API.

There are obvious inconsistencies and confusion in both type design and actual 
semantics. In order to avoid the problems of time offset and cross-system 
ambiguity, it is necessary to unify all timestamp types into expressions that 
contain time zone information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to