[ https://issues.apache.org/jira/browse/FLINK-38310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergei Morozov updated FLINK-38310: ----------------------------------- Description: When a FinishedSnapshotSplitInfo is represented as a string, its start and end offsets are represented as arrays. Each element of these arrays itself can be an array of bytes, if the split key type contains a binary field (see [apache/flink-cdc#879|https://github.com/apache/flink-cdc/pull/879]). If such a finished snapshot split info is logged, the log doesn't contain the value of the binary key, it contains its address in memory: {quote}splitStart=[[B@2c35e847], splitEnd=[[B@21e360a], highWatermark=... {quote} Additionally, if such objects are compared or hashed, they won't be equal and won't have equal hashes. Currently, there's no practical impact of this issue, but I'm working on a fix for FLINK-38270 where I will calculate the hash of the finished snapshot split infos on the enumerator and the source reader. This issue affects hashing. was: When a FinishedSnapshotSplitInfo is represented as a string, its start and end offsets are represented as arrays. Each element of these arrays itself can be an array of bytes, if the split key type contains a binary field (see [apache/flink-cdc#879|https://github.com/apache/flink-cdc/pull/879]). If such a finished snapshot split info is logged, the log doesn't contain the value of the binary key, it contains its address in memory: {quote}splitStart=[[B@2c35e847], splitEnd=[[B@21e360a], highWatermark=... {quote} Additionally, if such objects are compared or hashed, they won't be equal and won't have equal hashes. > Binary keys in a FinishedSnapshotSplitInfo are incorrectly compared and hashed > ------------------------------------------------------------------------------ > > Key: FLINK-38310 > URL: https://issues.apache.org/jira/browse/FLINK-38310 > Project: Flink > Issue Type: Bug > Components: Flink CDC > Affects Versions: cdc-3.4.0 > Reporter: Sergei Morozov > Priority: Major > > When a FinishedSnapshotSplitInfo is represented as a string, its start and > end offsets are represented as arrays. Each element of these arrays itself > can be an array of bytes, if the split key type contains a binary field (see > [apache/flink-cdc#879|https://github.com/apache/flink-cdc/pull/879]). > If such a finished snapshot split info is logged, the log doesn't contain the > value of the binary key, it contains its address in memory: > {quote}splitStart=[[B@2c35e847], splitEnd=[[B@21e360a], highWatermark=... > {quote} > Additionally, if such objects are compared or hashed, they won't be equal and > won't have equal hashes. > Currently, there's no practical impact of this issue, but I'm working on a > fix for FLINK-38270 where I will calculate the hash of the finished snapshot > split infos on the enumerator and the source reader. This issue affects > hashing. -- This message was sent by Atlassian Jira (v8.20.10#820010)