[ https://issues.apache.org/jira/browse/FLINK-36663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuyang Zhong closed FLINK-36663. -------------------------------- Resolution: Fixed > [Window]The first processWatermark() after Sliding Window restore may have > extra expired data. > ---------------------------------------------------------------------------------------------- > > Key: FLINK-36663 > URL: https://issues.apache.org/jira/browse/FLINK-36663 > Project: Flink > Issue Type: Bug > Reporter: kaitian > Assignee: kaitian > Priority: Major > Labels: pull-request-available, window > Fix For: 2.1.0 > > Attachments: image-2024-11-06-19-43-58-569.png, > image-2024-11-06-19-47-37-225.png, image-2024-11-06-19-50-35-362.png > > Original Estimate: 1h > Remaining Estimate: 1h > > The root cause is that the currentWatermark of TimerService is not restored > after restore. > !image-2024-11-06-19-43-58-569.png! > This will cause problems when using the sliding window, such as hopping > window. Because when flushing WindowBuffer, it is necessary to determine > whether the current <key, slice_end> is expired. Here, the currentWatermark > in the TimerService is used, and the currentWatermark in the TimerService > will only be updated to the <currentWatermark saved in the window> when > advance. > !image-2024-11-06-19-47-37-225.png! > This will cause that after restore and before the first advance, if > processElement(<key, slice_end> ), the slice_end has been triggered, but > because this slice is included in other untriggered windows, the data <key, > slice_end> will be written to the windowBuffer. (At this time, <the > currentWatermark in the window> is used to determine whether the data is > expired, and <the currentWatermark in the window> is restored): > !image-2024-11-06-19-50-35-362.png! > However, because <the currentWatermark in timeService> is used when > flushWindowBuffer, it is determined that the slice has not expired and the > timer is registered. > This will cause an expired slice_end to be registered, resulting in an > expired window being triggered and output extra data. > > > I have writen a UT to prove this problem: > !https://aone.alibaba-inc.com/v2/api/workitem/adapter/file/url?fileIdentifier=workitem%2Falibaba%2F875140%2F1730810319419%E6%88%AA%E5%B1%8F2024-11-05%2020.38.19.png! > Triggering restore will have different results. You can remove the restore > part in the red box and the test will pass. > use hopping window, window size = 3000, we have window range:[-1000,2000), > [0, 3000] > processWatermark (currentWatermark = 2001) > processElement: the expired record <"key2", 1, fromEpochMillis(1500L)> > if no restore after processWatermark (currentWatermark = 2001): > the expired record will not output in window range [-1000,2000). > > if restore after processWatermark (currentWatermark = 2001): > the expired record will not output in window range [-1000,2000). > > > Repair suggestions: > While restoring the window currentWatermark, restore the currentWatermark in > the timeService > -- This message was sent by Atlassian Jira (v8.20.10#820010)