[
https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296789#comment-17296789
]
Piotr Nowojski commented on FLINK-20654:
----------------------------------------
Hi [~lwlin]. We are not aware of issues with 1.11.x branch and unaligned
checkpoints. Most problems in this ticket with 1.12.x were caused by some
preliminary changes for FLINK-19681, so 1.11.x shouldn't be affected by it. As
of 1.12.2, we are not aware of any problems with unaligned checkpoints, last
one as you can see was reported/merged over a month ago. So both 1.11.3 and
1.12.2 should be stable in this regard.
Having said that, as a result of those bugs, 1.12.x and 1.13.x branches are
more thoroughly tested, so if you have an option to choose 1.11.3 or 1.12.2, I
would suggest 1.12.2.
> Unaligned checkpoint recovery may lead to corrupted data stream
> ---------------------------------------------------------------
>
> Key: FLINK-20654
> URL: https://issues.apache.org/jira/browse/FLINK-20654
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.12.0, 1.12.1
> Reporter: Arvid Heise
> Assignee: Piotr Nowojski
> Priority: Blocker
> Labels: pull-request-available, test-stability
> Fix For: 1.12.2, 1.13.0
>
>
> Fix of FLINK-20433 shows potential corruption after recovery for all
> variations of UnalignedCheckpointITCase.
> To reproduce, run UCITCase a couple hundreds times. The issue showed for me
> in:
> - execute [Parallel union, p = 5]
> - execute [Parallel union, p = 10]
> - execute [Parallel cogroup, p = 5]
> - execute [parallel pipeline with remote channels, p = 5]
> with decreasing frequency.
> The issue manifests as one of the following issues:
> - stream corrupted exception
> - EOF exception
> - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER
> - (for union) ArithmeticException overflow (because the number that should be
> [0;100000] has been mis-deserialized)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)