[
https://issues.apache.org/jira/browse/FLINK-28674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weike Dong updated FLINK-28674:
-------------------------------
Description:
Hi Devs,
Recently I have discovered that the _equaliser.equals_ call in
_org.apache.flink.table.runtime.operators.sink.SinkUpsertMaterializer#removeFirst_
generates wrong comparison results when two binary rows are the same, like
!image-2022-07-25-20-56-14-111.png!
After digging through the generated code for this equaliser, I have found that
when the two input RowData are all instances of {_}BinaryRowData{_}, the
_BinaryRowData#equals_ method is directly called to give the comparison result.
!image-2022-07-25-20-59-31-933.png!
However, as you can see in the first snapshot, _BinaryRowData#equals_ cannot
properly handle complex data types like {_}Timestamp{_}, so it returns _false_
even when the actual timestamp values are the same, causing
SinkUpsertMaterializer to falsely think that there are no matches in the
states, hence printing errors like "The state is cleared because of state ttl",
which eventually leads to the loss of -U data in the final results.
P.S. the equals method of BinaryRowData actually compares the underlying
MemorySegments, which is not suitable for types like Timestamp
was:
Hi Devs,
Recently I have discovered that the _equaliser.equals_ call in
_org.apache.flink.table.runtime.operators.sink.SinkUpsertMaterializer#removeFirst_
generates wrong comparison results when two binary rows are the same, like
!image-2022-07-25-20-56-14-111.png!
After digging through the generated code for this equaliser, I have found that
when the two input RowData are all instances of {_}BinaryRowData{_}, the
_BinaryRowData#equals_ method is directly called to give the comparison result.
!image-2022-07-25-20-59-31-933.png!
However, as you can see in the first snapshot, _BinaryRowData#equals_ cannot
properly handle complex data types like {_}Timestamp{_}, so it returns _false_
even when the actual timestamp values are the same, causing
SinkUpsertMaterializer to falsely think that there are no matches in the
states, hence printing errors like "The state is cleared because of state ttl",
which eventually leads to the loss of -U data in the final results.
> EqualiserCodeGenerator generates wrong equaliser for Timestamp fields in
> BinaryRowData
> --------------------------------------------------------------------------------------
>
> Key: FLINK-28674
> URL: https://issues.apache.org/jira/browse/FLINK-28674
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Runtime
> Affects Versions: 1.13.6, 1.14.5, 1.15.1
> Environment: Flink 1.13.6
> Reporter: Weike Dong
> Priority: Major
> Attachments: image-2022-07-25-20-56-14-111.png,
> image-2022-07-25-20-59-31-933.png, image-2022-07-25-21-17-33-608.png
>
>
> Hi Devs,
> Recently I have discovered that the _equaliser.equals_ call in
> _org.apache.flink.table.runtime.operators.sink.SinkUpsertMaterializer#removeFirst_
> generates wrong comparison results when two binary rows are the same, like
> !image-2022-07-25-20-56-14-111.png!
> After digging through the generated code for this equaliser, I have found
> that when the two input RowData are all instances of {_}BinaryRowData{_}, the
> _BinaryRowData#equals_ method is directly called to give the comparison
> result.
> !image-2022-07-25-20-59-31-933.png!
> However, as you can see in the first snapshot, _BinaryRowData#equals_ cannot
> properly handle complex data types like {_}Timestamp{_}, so it returns
> _false_ even when the actual timestamp values are the same, causing
> SinkUpsertMaterializer to falsely think that there are no matches in the
> states, hence printing errors like "The state is cleared because of state
> ttl", which eventually leads to the loss of -U data in the final results.
>
> P.S. the equals method of BinaryRowData actually compares the underlying
> MemorySegments, which is not suitable for types like Timestamp
--
This message was sent by Atlassian Jira
(v8.20.10#820010)