[ https://issues.apache.org/jira/browse/FLINK-21329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315571#comment-17315571 ]
Seth Wiesman commented on FLINK-21329: -------------------------------------- I looked at the test and I don't see anything obvious that would point to FLINK-19467. Just to be safe I kicked off a build that reverts the test to use the old RocksDBStateBackend class. If this passes without issue then my change was the problem. You can follow the build here. https://dev.azure.com/sjwiesman/Flink/_build/results?buildId=56&view=results > "Local recovery and sticky scheduling end-to-end test" does not finish within > 600 seconds > ----------------------------------------------------------------------------------------- > > Key: FLINK-21329 > URL: https://issues.apache.org/jira/browse/FLINK-21329 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.13.0 > Reporter: Robert Metzger > Assignee: Matthias > Priority: Critical > Labels: test-stability > Fix For: 1.13.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=13118&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=38515 > {code} > Feb 08 22:25:46 > ============================================================================== > Feb 08 22:25:46 Running 'Local recovery and sticky scheduling end-to-end test' > Feb 08 22:25:46 > ============================================================================== > Feb 08 22:25:46 TEST_DATA_DIR: > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-46881214821 > Feb 08 22:25:47 Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.13-SNAPSHOT-bin/flink-1.13-SNAPSHOT > Feb 08 22:25:47 Running local recovery test with configuration: > Feb 08 22:25:47 parallelism: 4 > Feb 08 22:25:47 max attempts: 10 > Feb 08 22:25:47 backend: rocks > Feb 08 22:25:47 incremental checkpoints: false > Feb 08 22:25:47 kill JVM: false > Feb 08 22:25:47 Starting zookeeper daemon on host fv-az127-394. > Feb 08 22:25:47 Starting HA cluster with 1 masters. > Feb 08 22:25:48 Starting standalonesession daemon on host fv-az127-394. > Feb 08 22:25:49 Starting taskexecutor daemon on host fv-az127-394. > Feb 08 22:25:49 Waiting for Dispatcher REST endpoint to come up... > Feb 08 22:25:50 Waiting for Dispatcher REST endpoint to come up... > Feb 08 22:25:51 Waiting for Dispatcher REST endpoint to come up... > Feb 08 22:25:53 Waiting for Dispatcher REST endpoint to come up... > Feb 08 22:25:54 Dispatcher REST endpoint is up. > Feb 08 22:25:54 Started TM watchdog with PID 28961. > Feb 08 22:25:58 Job has been submitted with JobID > e790e85a39040539f9386c0df7ca4812 > Feb 08 22:35:47 Test (pid: 27970) did not finish after 600 seconds. > Feb 08 22:35:47 Printing Flink logs and killing it: > {code} > and > {code} > at > org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver.unhandledError(ZooKeeperLeaderRetrievalDriver.java:184) > at > org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:713) > at > org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:709) > at > org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) > at > org.apache.flink.shaded.curator4.org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) > at > org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.logError(CuratorFrameworkImpl.java:708) > at > org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:874) > at > org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990) > at > org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943) > at > org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66) > at > org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss > at > org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException.create(KeeperException.java:102) > at > org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862) > ... 10 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)