[
https://issues.apache.org/jira/browse/FLINK-36664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903042#comment-17903042
]
Thomas Cooper commented on FLINK-36664:
---------------------------------------
Instead of pasting screen shots could you not have used {{{{code:java}}}}
blocks? It makes this issue very hard to parse.
> [Window]The window with window_offset will lose data.
> -----------------------------------------------------
>
> Key: FLINK-36664
> URL: https://issues.apache.org/jira/browse/FLINK-36664
> Project: Flink
> Issue Type: Bug
> Reporter: kaitian
> Assignee: kaitian
> Priority: Major
> Labels: pull-request-available, window
> Attachments: image-2024-11-06-20-20-34-562.png,
> image-2024-11-06-20-23-53-184.png
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> When setting the offset for the window, data is lost because the triggering
> window time is not aligned.
>
> When SlicingWindowOperator processesWatermarker, it records the time when the
> next window is triggered (nextTriggerWatermark):
> !image-2024-11-06-20-20-34-562.png!
> The calculation method is to first calculate the begin time of the window
> where the watermark is located, but the offset passed in during the
> calculation is 0:
> !image-2024-11-06-20-23-53-184.png!
> That is to say, the window triggering time does not take into account the
> window offset.
> It is OK if the nextTriggerWatermark is too small. When watermark-offset>0,
> if (watermark-offset)%window_size > watermark%window_size is satisfied, the
> nextTriggerWatermark will be too large, where offset and window_size are
> constants. If the watermark is completely random, it is easy to prove that
> there is a 50% probability that the nextTriggerWatermark will be too large.
> When the nextTriggerWatermark is too large, a processWatermark should have
> flushWindowBuffer but was not triggered, resulting in less data in the
> currently triggered window (assuming it is key1). When the next
> processWatermark triggers flushWindowBuffer, since the Watermark has moved
> forward, key1 will be regarded as expired data and the timer will not be
> registered. That is to say, the subsequent processWatermark will no longer
> calculate key1, and data will be lost.
>
> I have writen a UT to prove this bug:
> window_size 3000, offset 1000, tumping window
> window_size 3000, offset 1000. When processWatermark(3000), the normally
> calculated nextTriggerProgress = 3000 - (3000-1000)%3000 + 3000-1 = 3999, but
> because the code does not consider the offset, nextTriggerProgress = 3000 -
> (3000-0)%3000 + 3000-1 = 5999, which is too large.We will lose data.
> !https://intranetproxy.alipay.com/skylark/lark/0/2024/png/101856358/1730893082766-4ea5d356-2b19-436d-be46-093f70e445cd.png!
> This wrong UT will pass. You can see that the data in the UT is lost and will
> not be calculated in any subsequent processWatermark.
>
> Repair suggestion:
> pass window_offset when calculating nextTriggerWatermark
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)