[ 
https://issues.apache.org/jira/browse/FLINK-36663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuyang Zhong closed FLINK-36663.
--------------------------------
    Resolution: Fixed

> [Window]The first processWatermark() after Sliding Window restore may have 
> extra expired data.
> ----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-36663
>                 URL: https://issues.apache.org/jira/browse/FLINK-36663
>             Project: Flink
>          Issue Type: Bug
>            Reporter: kaitian
>            Assignee: kaitian
>            Priority: Major
>              Labels: pull-request-available, window
>             Fix For: 2.1.0
>
>         Attachments: image-2024-11-06-19-43-58-569.png, 
> image-2024-11-06-19-47-37-225.png, image-2024-11-06-19-50-35-362.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The root cause is that the currentWatermark of TimerService is not restored 
> after restore.
> !image-2024-11-06-19-43-58-569.png!
> This will cause problems when using the sliding window, such as hopping 
> window. Because when flushing WindowBuffer, it is necessary to determine 
> whether the current <key, slice_end> is expired. Here, the currentWatermark 
> in the TimerService is used, and the currentWatermark in the TimerService 
> will only be updated to the <currentWatermark saved in the window> when 
> advance.
> !image-2024-11-06-19-47-37-225.png!
> This will cause that after restore and before the first advance, if 
> processElement(<key, slice_end> ), the slice_end has been triggered, but 
> because this slice is included in other untriggered windows, the data <key, 
> slice_end> will be written to the windowBuffer. (At this time, <the 
> currentWatermark in the window> is used to determine whether the data is 
> expired, and <the currentWatermark in the window> is restored):
> !image-2024-11-06-19-50-35-362.png!
> However, because <the currentWatermark in timeService> is used when 
> flushWindowBuffer, it is determined that the slice has not expired and the 
> timer is registered.
> This will cause an expired slice_end to be registered, resulting in an 
> expired window being triggered and output extra data.
>  
>  
> I have writen a UT to prove this problem:
> !https://aone.alibaba-inc.com/v2/api/workitem/adapter/file/url?fileIdentifier=workitem%2Falibaba%2F875140%2F1730810319419%E6%88%AA%E5%B1%8F2024-11-05%2020.38.19.png!
> Triggering restore will have different results. You can remove the restore 
> part in the red box and the test will pass.
> use hopping window, window size = 3000, we have window range:[-1000,2000), 
> [0, 3000]
> processWatermark (currentWatermark = 2001)
> processElement:     the expired record <"key2", 1, fromEpochMillis(1500L)>
> if no restore after processWatermark (currentWatermark = 2001):
> the expired record will not output in window range [-1000,2000).
>  
> if restore after processWatermark (currentWatermark = 2001):
> the expired record will not output in window range [-1000,2000).
>  
>  
> Repair suggestions:
> While restoring the window currentWatermark, restore the currentWatermark in 
> the timeService
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to