[ 
https://issues.apache.org/jira/browse/FLINK-38310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergei Morozov updated FLINK-38310:
-----------------------------------
    Description: 
When a FinishedSnapshotSplitInfo is represented as a string, its start and end 
offsets are represented as arrays. Each element of these arrays itself can be 
an array of bytes, if the split key type contains a binary field (see 
[apache/flink-cdc#879|https://github.com/apache/flink-cdc/pull/879]).

If such a finished snapshot split info is logged, the log doesn't contain the 
value of the binary key, it contains its address in memory:
{quote}splitStart=[[B@2c35e847], splitEnd=[[B@21e360a], highWatermark=...
{quote}
Additionally, if such objects are compared or hashed, they won't be equal and 
won't have equal hashes.

Currently, there's no practical impact of this issue, but I'm working on a fix 
for FLINK-38270 where I will calculate the hash of the finished snapshot split 
infos on the enumerator and the source reader. This issue affects hashing.

  was:
When a FinishedSnapshotSplitInfo is represented as a string, its start and end 
offsets are represented as arrays. Each element of these arrays itself can be 
an array of bytes, if the split key type contains a binary field (see 
[apache/flink-cdc#879|https://github.com/apache/flink-cdc/pull/879]).

If such a finished snapshot split info is logged, the log doesn't contain the 
value of the binary key, it contains its address in memory:
{quote}splitStart=[[B@2c35e847], splitEnd=[[B@21e360a], highWatermark=...
{quote}
Additionally, if such objects are compared or hashed, they won't be equal and 
won't have equal hashes.


> Binary keys in a FinishedSnapshotSplitInfo are incorrectly compared and hashed
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-38310
>                 URL: https://issues.apache.org/jira/browse/FLINK-38310
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>    Affects Versions: cdc-3.4.0
>            Reporter: Sergei Morozov
>            Priority: Major
>
> When a FinishedSnapshotSplitInfo is represented as a string, its start and 
> end offsets are represented as arrays. Each element of these arrays itself 
> can be an array of bytes, if the split key type contains a binary field (see 
> [apache/flink-cdc#879|https://github.com/apache/flink-cdc/pull/879]).
> If such a finished snapshot split info is logged, the log doesn't contain the 
> value of the binary key, it contains its address in memory:
> {quote}splitStart=[[B@2c35e847], splitEnd=[[B@21e360a], highWatermark=...
> {quote}
> Additionally, if such objects are compared or hashed, they won't be equal and 
> won't have equal hashes.
> Currently, there's no practical impact of this issue, but I'm working on a 
> fix for FLINK-38270 where I will calculate the hash of the finished snapshot 
> split infos on the enumerator and the source reader. This issue affects 
> hashing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to