[
https://issues.apache.org/jira/browse/FLINK-38024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zjjiang updated FLINK-38024:
----------------------------
Description:
In the current FlinkCDC implementation of timestamp types, the definition of
TIMESTAMP and TIMESTAMP_LTZ types and the corresponding internal cdc
implementation classes have some semantic confusion and practical use
deviation, which brings difficulties to users' understanding, development,
debugging and maintenance.
Meanwhile, the lack of time zone information leads to the possibility of time
offset problem when cross-synchronizing these two types, as described in
FLINK-36806.
*1. Semantic deviation of TimestampData*
* Semantics: `TimestampData` was originally designed to represent a “timestamp
of UTC+0”.
* Practical use: In code, it is widely used as the carrier structure of
TIMESTAMP WITHOUT TIME ZONE, which is supposed to represent local time, i.e.,
without the semantics of time zone, and is not equivalent to UTC.
* Confusion points:
** Users could mistakenly believe that TimestampData represents UTC time;
** What is actually stored is a time stamp with no time zone (e.g. “2025-06-27
10:00:00”), not UTC;
** Ambiguity may occur during time zone conversion or cross-system
synchronization (e.g. Kafka -> Iceberg).
*2. LocalZonedTimestampData usage bias*
* Semantics: theoretically, it should represent an arbitrary epoch timestamp,
i.e., the local time interpreted according to the specified time zone.
* Practical use: actually carries a value of type TIMESTAMP_LTZ, which
represents a timestamp that has been converted to UTC.
* Confusion points:
** LocalZoned in the name can be easily misinterpreted as “local time with
time zone”, but it is already UTC in reality;
** There is semantic inconsistency between the class and the use of
TIMESTAMP_LTZ in the FlinkCDC type system;
** Type mismatch is easy to be generated when interacting with Flink Planner
or Table API.
There are obvious inconsistencies and confusion in both type design and actual
semantics. In order to avoid the problems of time offset and cross-system
ambiguity, it is necessary to unify all timestamp types into expressions that
contain time zone information.
was:
In the current FlinkCDC implementation of timestamp types, the definition of
TIMESTAMP and TIMESTAMP_LTZ types and the corresponding internal cdc
implementation classes have some semantic confusion and practical use
deviation, which brings difficulties to users' understanding, development,
debugging and maintenance.
Meanwhile, the lack of time zone information leads to the possibility of time
offset problem when cross-synchronizing these two types, as described in
FLINK-36806.
*1. Semantic deviation of TimestampData class*
* Semantics: `TimestampData` was originally designed to represent a “timestamp
of UTC+0”.
* Practical use: In code, it is widely used as the carrier structure of
TIMESTAMP WITHOUT TIME ZONE, which is supposed to represent local time, i.e.,
without the semantics of time zone, and is not equivalent to UTC.
* Confusion points:
** Users could mistakenly believe that TimestampData represents UTC time;
** What is actually stored is a time stamp with no time zone (e.g. “2025-06-27
10:00:00”), not UTC;
** Ambiguity may occur during time zone conversion or cross-system
synchronization (e.g. Kafka -> Iceberg).
*2. LocalZonedTimestampData class usage bias*
* Semantics: theoretically, it should represent an arbitrary epoch timestamp,
i.e., the local time interpreted according to the specified time zone.
* Practical use: actually carries a value of type TIMESTAMP_LTZ, which
represents a timestamp that has been converted to UTC.
* Confusion points:
** LocalZoned in the name can be easily misinterpreted as “local time with
time zone”, but it is already UTC in reality;
** There is semantic inconsistency between the class and the use of
TIMESTAMP_LTZ in the FlinkCDC type system;
** Type mismatch is easy to be generated when interacting with Flink Planner
or Table API.
There are obvious inconsistencies and confusion in both type design and actual
semantics. In order to avoid the problems of time offset and cross-system
ambiguity, it is necessary to unify all timestamp types into expressions that
contain time zone information.
> Unify the timestamp types to use TimestampData with time zone information.
> --------------------------------------------------------------------------
>
> Key: FLINK-38024
> URL: https://issues.apache.org/jira/browse/FLINK-38024
> Project: Flink
> Issue Type: Improvement
> Components: Flink CDC
> Affects Versions: cdc-3.4.0, cdc-3.5.0
> Reporter: zjjiang
> Priority: Major
>
> In the current FlinkCDC implementation of timestamp types, the definition of
> TIMESTAMP and TIMESTAMP_LTZ types and the corresponding internal cdc
> implementation classes have some semantic confusion and practical use
> deviation, which brings difficulties to users' understanding, development,
> debugging and maintenance.
> Meanwhile, the lack of time zone information leads to the possibility of time
> offset problem when cross-synchronizing these two types, as described in
> FLINK-36806.
> *1. Semantic deviation of TimestampData*
> * Semantics: `TimestampData` was originally designed to represent a
> “timestamp of UTC+0”.
> * Practical use: In code, it is widely used as the carrier structure of
> TIMESTAMP WITHOUT TIME ZONE, which is supposed to represent local time, i.e.,
> without the semantics of time zone, and is not equivalent to UTC.
> * Confusion points:
> ** Users could mistakenly believe that TimestampData represents UTC time;
> ** What is actually stored is a time stamp with no time zone (e.g.
> “2025-06-27 10:00:00”), not UTC;
> ** Ambiguity may occur during time zone conversion or cross-system
> synchronization (e.g. Kafka -> Iceberg).
> *2. LocalZonedTimestampData usage bias*
> * Semantics: theoretically, it should represent an arbitrary epoch
> timestamp, i.e., the local time interpreted according to the specified time
> zone.
> * Practical use: actually carries a value of type TIMESTAMP_LTZ, which
> represents a timestamp that has been converted to UTC.
> * Confusion points:
> ** LocalZoned in the name can be easily misinterpreted as “local time with
> time zone”, but it is already UTC in reality;
> ** There is semantic inconsistency between the class and the use of
> TIMESTAMP_LTZ in the FlinkCDC type system;
> ** Type mismatch is easy to be generated when interacting with Flink Planner
> or Table API.
> There are obvious inconsistencies and confusion in both type design and
> actual semantics. In order to avoid the problems of time offset and
> cross-system ambiguity, it is necessary to unify all timestamp types into
> expressions that contain time zone information.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)