[
https://issues.apache.org/jira/browse/FLINK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489794#comment-17489794
]
Roman Khachatryan edited comment on FLINK-26019 at 2/10/22, 10:09 PM:
----------------------------------------------------------------------
Closing as superceded by FLINK-26062 which replaced poll() with remove() in
changelog.
was (Author: roman_khachatryan):
Closing as superceded by FLINK-26062.
> [Changelog] PriorityQueue elements recovered out-of-order
> ---------------------------------------------------------
>
> Key: FLINK-26019
> URL: https://issues.apache.org/jira/browse/FLINK-26019
> Project: Flink
> Issue Type: Bug
> Components: Runtime / State Backends
> Affects Versions: 1.15.0
> Reporter: Roman Khachatryan
> Assignee: Roman Khachatryan
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.15.0
>
>
> StateChangeFormat is the class responsible for writing out changelog data.
> Each chunk of data is sorted by: logId -> sequenceNumber -> keyGroup.
> Sorting by sequenceNumber preserves temporal order.
> Sorting by keyGroup a) puts metadata (group -1) at the beginning and b)
> allows to write KG only once.
> However, the assumption that the order of changes across groups currently
> doesn't hold: poll operation of InternalPriorityQueue may affect any group
> (the smaller item across groups so far will be polled).
> This results in wrong processing time timers being removed on recovery in
> ProcessingTimeWindowCheckpointingITCase#testAggregatingSlidingProcessingTimeWindow
> One way to solve this probelm is to simply disable KG-sorting and grouping
> (only output metadata at the beginning).
> The other one is to associate polled element with the correct key group while
> logging changes.
> Both ways should work with re-scaling.
> cc: [~masteryhx], [~ym], [~yunta]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)