[ https://issues.apache.org/jira/browse/FLINK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Greg Hogan updated FLINK-4117: ------------------------------ Description: Received the following error when locally running {{mvn verify}}. Searching on the error it looks like we are not waiting for the Zookeeper connection to be established as this occurs asynchronously. In ZookeeperUtils.java:98 we call {{CuratorFramework.start()}} and we could then call {{CuratorFramework.blockUntilConnected}} with the same timeout. {code} Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 323.326 sec <<< FAILURE! - in org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase testConcurrentGetAndIncrement(org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase) Time elapsed: 266.521 sec <<< ERROR! java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /flink/checkpoint-id-counter at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest.testConcurrentGetAndIncrement(CheckpointIDCounterTest.java:129) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /flink/checkpoint-id-counter at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302) at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:288) at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279) at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:41) at org.apache.curator.framework.recipes.shared.SharedValue.readValue(SharedValue.java:244) at org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:177) at org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:111) at org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter.getAndIncrement(ZooKeeperCheckpointIDCounter.java:121) at org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:201) at org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:178) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 375.259 sec - in org.apache.flink.runtime.operators.sort.ExternalSortLargeRecordsITCase Results : Tests in error: CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase>CheckpointIDCounterTest.testConcurrentGetAndIncrement:129 » Execution {code} was: Received the following error when locally running {{mvn verify}}. Searching on the error it looks like we are not waiting for the Zookeeper connection to be established as this occurs asynchronously. In ZookeeperUtils.java:98 we call {{CuratorFramework.start()}} and we could then call {{{{CuratorFramework.blockUntilConnected}} with the same timeout. {code} Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 323.326 sec <<< FAILURE! - in org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase testConcurrentGetAndIncrement(org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase) Time elapsed: 266.521 sec <<< ERROR! java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /flink/checkpoint-id-counter at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest.testConcurrentGetAndIncrement(CheckpointIDCounterTest.java:129) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /flink/checkpoint-id-counter at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302) at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:288) at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279) at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:41) at org.apache.curator.framework.recipes.shared.SharedValue.readValue(SharedValue.java:244) at org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:177) at org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:111) at org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter.getAndIncrement(ZooKeeperCheckpointIDCounter.java:121) at org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:201) at org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:178) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 375.259 sec - in org.apache.flink.runtime.operators.sort.ExternalSortLargeRecordsITCase Results : Tests in error: CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase>CheckpointIDCounterTest.testConcurrentGetAndIncrement:129 » Execution {code} > Wait for CuratorFramework connection to be established > ------------------------------------------------------ > > Key: FLINK-4117 > URL: https://issues.apache.org/jira/browse/FLINK-4117 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing > Affects Versions: 1.1.0 > Reporter: Greg Hogan > > Received the following error when locally running {{mvn verify}}. Searching > on the error it looks like we are not waiting for the Zookeeper connection to > be established as this occurs asynchronously. In ZookeeperUtils.java:98 we > call {{CuratorFramework.start()}} and we could then call > {{CuratorFramework.blockUntilConnected}} with the same timeout. > {code} > Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 323.326 sec > <<< FAILURE! - in > org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase > testConcurrentGetAndIncrement(org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase) > Time elapsed: 266.521 sec <<< ERROR! > java.util.concurrent.ExecutionException: > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for /flink/checkpoint-id-counter > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest.testConcurrentGetAndIncrement(CheckpointIDCounterTest.java:129) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /flink/checkpoint-id-counter > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:288) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:41) > at > org.apache.curator.framework.recipes.shared.SharedValue.readValue(SharedValue.java:244) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:177) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:111) > at > org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter.getAndIncrement(ZooKeeperCheckpointIDCounter.java:121) > at > org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:201) > at > org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:178) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 375.259 sec - > in org.apache.flink.runtime.operators.sort.ExternalSortLargeRecordsITCase > Results : > Tests in error: > > CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase>CheckpointIDCounterTest.testConcurrentGetAndIncrement:129 > » Execution > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)