[
https://issues.apache.org/jira/browse/FLINK-26062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490549#comment-17490549
]
Roman Khachatryan commented on FLINK-26062:
-------------------------------------------
Fix for poll() (replacement with remove()) merged into master as
9034b3c32875536b8b7d99aa31c7a2e6c942c811.
I'm going to close the ticket.
[~yunta] please reopen it if you think we should also approach peek(); or there
are some other issues.
> [Changelog] Non-deterministic recovery of PriorityQueue states
> --------------------------------------------------------------
>
> Key: FLINK-26062
> URL: https://issues.apache.org/jira/browse/FLINK-26062
> Project: Flink
> Issue Type: Bug
> Components: Runtime / State Backends
> Affects Versions: 1.15.0
> Reporter: Roman Khachatryan
> Assignee: Roman Khachatryan
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.15.0
>
>
> Currently, InternalPriorityQueue.poll() is logged as a separate operation,
> without specifying the element that has been polled. On recovery, this
> recorded poll() is replayed.
> However, this is not deterministic because the order of PQ elements with
> equal priorityis not specified. For example, TimerHeapInternalTimer only
> compares timestamps, which are often equal. This results in polling timers
> from queue in wrong order => dropping timers => and not firing timers.
>
> ProcessingTimeWindowCheckpointingITCase.testAggregatingSlidingProcessingTimeWindow
> fails with materialization enabled and using heap state backend (both
> in-memory and fs-based implementations).
>
> Proposed solution is to replace poll with remove operation (which is based on
> equality).
>
> cc: [~masteryhx], [~ym], [~yunta]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)