[
https://issues.apache.org/jira/browse/FLINK-35826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866220#comment-17866220
]
xuyang commented on FLINK-35826:
--------------------------------
I believe this is a more general problem, that is "how does an operator
determine if a piece of data should be considered as late data".
Currently, we determine it on an operator level by comparing with the minimum
of multiple input watermarks, but in a distributed system, the watermarks from
multiple upstream operators or multiple parallelism of one upstream operator
might be delayed.
One speculative possibility: could the data discarding be unified in
WatermarkAssigner?
> [SQL] Sliding window may produce unstable calculations when processing
> changelog data.
> --------------------------------------------------------------------------------------
>
> Key: FLINK-35826
> URL: https://issues.apache.org/jira/browse/FLINK-35826
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Planner
> Affects Versions: 1.20.0
> Environment: flink with release-1.20
> Reporter: Yuan Kui
> Assignee: xuyang
> Priority: Major
> Attachments: image-2024-07-12-14-27-58-061.png
>
>
> Calculation results may be unstable when using a sliding window to process
> changelog data. Repeat the execution 10 times, the test results are partial
> success and partial failure:
> !image-2024-07-12-14-27-58-061.png!
> See the documentation and code for more details.
> [https://docs.google.com/document/d/1JmwSLs4SJvZKe7kqALqVBZ-1F1OyPmiWw8J6Ug6vqW0/edit?usp=sharing]
> code:
> [[BUG] Reproduce the issue of unstable sliding window calculation results ·
> yuchengxin/flink@c003e45
> (github.com)|https://github.com/yuchengxin/flink/commit/c003e45082e0d1464111c286ac9c7abb79527492]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)