[ https://issues.apache.org/jira/browse/FLINK-17918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123385#comment-17123385 ]
Arvid Heise commented on FLINK-17918: ------------------------------------- Yes, the root cause is {{LIMIT}} swallowing up some tail records because it saw the first records two times effectively. Since LIMIT is used in many tests (I guess for correctly working retract streams, but I haven't understood it entirely), many tests are currently prone to fail. On a side-note, I'd keep the change in my commit on the {{FailingDataSource}} that fails after at least one element was checkpointed (ofc remove all timing related changes). The current way effectively drains all records before triggering the crash, which will hide quite a lot of failures like this one. > Blink Jobs are loosing data on recovery > --------------------------------------- > > Key: FLINK-17918 > URL: https://issues.apache.org/jira/browse/FLINK-17918 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Table SQL / Runtime > Affects Versions: 1.11.0 > Reporter: Piotr Nowojski > Priority: Blocker > Fix For: 1.11.0 > > > After trying to enable unaligned checkpoints by default, a lot of Blink > streaming SQL/Table API tests containing joins or set operations are throwing > errors that are indicating we are loosing some data (full records, without > deserialisation errors). Example errors: > {noformat} > [ERROR] Failures: > [ERROR] JoinITCase.testFullJoinWithEqualPk:775 expected:<List(1,1, 2,2, > 3,3, null,4, null,5)> but was:<List(2,2, 3,3, null,1, null,4, null,5)> > [ERROR] JoinITCase.testStreamJoinWithSameRecord:391 expected:<List(1,1,1,1, > 1,1,1,1, 2,2,2,2, 2,2,2,2, 3,3,3,3, 3,3,3,3, 4,4,4,4, 4,4,4,4, 5,5,5,5, > 5,5,5,5)> but was:<List()> > [ERROR] SemiAntiJoinStreamITCase.testAntiJoin:352 expected:<0> but was:<1> > [ERROR] SetOperatorsITCase.testIntersect:55 expected:<MutableList(1,1,Hi, > 2,2,Hello, 3,2,Hello world)> but was:<List()> > [ERROR] JoinITCase.testJoinPushThroughJoin:1272 expected:<List(1,0,Hi, > 2,1,Hello, 2,1,Hello world)> but was:<List(2,1,Hello, 2,1,Hello world)> > {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005)