Roman Khachatryan created FLINK-26062:
-----------------------------------------
Summary: [Changelog] Non-deterministic recovery of PriorityQueue
states
Key: FLINK-26062
URL: https://issues.apache.org/jira/browse/FLINK-26062
Project: Flink
Issue Type: Bug
Components: Runtime / State Backends
Affects Versions: 1.15.0
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
Fix For: 1.15.0
Currently, InternalPriorityQueue.poll() is logged as a separate operation,
without specifying the element that has been polled. On recovery, this recorded
poll() is replayed.
However, this is not deterministic because the order of PQ elements with equal
priorityis not specified. For example, TimerHeapInternalTimer only compares
timestamps, which are often equal. This results in polling timers from queue in
wrong order => dropping timers => and not firing timers.
ProcessingTimeWindowCheckpointingITCase.testAggregatingSlidingProcessingTimeWindow
fails with materialization enabled and using heap state backend (both
in-memory and fs-based implementations).
Proposed solution is to replace poll with remove operation (which is based on
equality).
cc: [~masteryhx], [~ym], [~yunta]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)