[ https://issues.apache.org/jira/browse/BEAM-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743799#comment-16743799 ]
Etienne Chauchot commented on BEAM-1582: ---------------------------------------- [~kenn] I took a look at the PostCommit history. The last flake of ResumeFromCheckpointStreamingTest was 04/24/18. there was some failures by the end of April and they are all related to {code:java} ERROR org.apache.zookeeper.server.ZooKeeperServer - ZKShutdownHandler is not registered, so ZooKeeper server won't take any action on ERROR or SHUTDOWN server state changes {code} IMHO we should not close this ticket because, as [~amitsela] and [~aviemzur] said, there is no fix right now. I think that postponing this ticket as you did is the right choice as it is not a release blocker IMHO (seems related to load). We could take care of it in the future. > ResumeFromCheckpointStreamingTest flakes with what appears as a second firing. > ------------------------------------------------------------------------------ > > Key: BEAM-1582 > URL: https://issues.apache.org/jira/browse/BEAM-1582 > Project: Beam > Issue Type: Bug > Components: runner-spark > Reporter: Amit Sela > Priority: Minor > Labels: flake > Fix For: 2.11.0 > > > See: > https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-spark/2788/testReport/junit/org.apache.beam.runners.spark.translation.streaming/ResumeFromCheckpointStreamingTest/testWithResume/ > After some digging in it appears that a second firing occurs (though only one > is expected) but it doesn't come from a stale state (state is empty before it > fires). > Might be a retry happening for some reason, which is OK in terms of > fault-tolerance guarantees (at-least-once), but not so much in terms of flaky > tests. > I'm looking into this hoping to fix this ASAP. -- This message was sent by Atlassian JIRA (v7.6.3#76005)