[ 
https://issues.apache.org/jira/browse/FLINK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-4117:
------------------------------
    Description: 
Received the following error when locally running {{mvn verify}}. Searching on 
the error it looks like we are not waiting for the Zookeeper connection to be 
established as this occurs asynchronously. In {{ZookeeperUtils.java:98}} we 
call {{CuratorFramework.start()}} and we could then call 
{{CuratorFramework.blockUntilConnected}} with the same timeout.

{code}
Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 323.326 sec <<< 
FAILURE! - in 
org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase
testConcurrentGetAndIncrement(org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase)
  Time elapsed: 266.521 sec  <<< ERROR!
java.util.concurrent.ExecutionException: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /flink/checkpoint-id-counter
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest.testConcurrentGetAndIncrement(CheckpointIDCounterTest.java:129)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
        at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
        at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
        at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /flink/checkpoint-id-counter
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:288)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:41)
        at 
org.apache.curator.framework.recipes.shared.SharedValue.readValue(SharedValue.java:244)
        at 
org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:177)
        at 
org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:111)
        at 
org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter.getAndIncrement(ZooKeeperCheckpointIDCounter.java:121)
        at 
org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:201)
        at 
org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:178)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 375.259 sec - 
in org.apache.flink.runtime.operators.sort.ExternalSortLargeRecordsITCase

Results :

Tests in error: 
  
CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase>CheckpointIDCounterTest.testConcurrentGetAndIncrement:129
 » Execution
{code}

  was:
Received the following error when locally running {{mvn verify}}. Searching on 
the error it looks like we are not waiting for the Zookeeper connection to be 
established as this occurs asynchronously. In ZookeeperUtils.java:98 we call 
{{CuratorFramework.start()}} and we could then call 
{{CuratorFramework.blockUntilConnected}} with the same timeout.

{code}
Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 323.326 sec <<< 
FAILURE! - in 
org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase
testConcurrentGetAndIncrement(org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase)
  Time elapsed: 266.521 sec  <<< ERROR!
java.util.concurrent.ExecutionException: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /flink/checkpoint-id-counter
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest.testConcurrentGetAndIncrement(CheckpointIDCounterTest.java:129)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
        at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
        at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
        at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /flink/checkpoint-id-counter
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:288)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:41)
        at 
org.apache.curator.framework.recipes.shared.SharedValue.readValue(SharedValue.java:244)
        at 
org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:177)
        at 
org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:111)
        at 
org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter.getAndIncrement(ZooKeeperCheckpointIDCounter.java:121)
        at 
org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:201)
        at 
org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:178)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 375.259 sec - 
in org.apache.flink.runtime.operators.sort.ExternalSortLargeRecordsITCase

Results :

Tests in error: 
  
CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase>CheckpointIDCounterTest.testConcurrentGetAndIncrement:129
 » Execution
{code}


> Wait for CuratorFramework connection to be established
> ------------------------------------------------------
>
>                 Key: FLINK-4117
>                 URL: https://issues.apache.org/jira/browse/FLINK-4117
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.1.0
>            Reporter: Greg Hogan
>
> Received the following error when locally running {{mvn verify}}. Searching 
> on the error it looks like we are not waiting for the Zookeeper connection to 
> be established as this occurs asynchronously. In {{ZookeeperUtils.java:98}} 
> we call {{CuratorFramework.start()}} and we could then call 
> {{CuratorFramework.blockUntilConnected}} with the same timeout.
> {code}
> Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 323.326 sec 
> <<< FAILURE! - in 
> org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase
> testConcurrentGetAndIncrement(org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase)
>   Time elapsed: 266.521 sec  <<< ERROR!
> java.util.concurrent.ExecutionException: 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /flink/checkpoint-id-counter
>       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>       at 
> org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest.testConcurrentGetAndIncrement(CheckpointIDCounterTest.java:129)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>       at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>       at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /flink/checkpoint-id-counter
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>       at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302)
>       at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291)
>       at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>       at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:288)
>       at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279)
>       at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:41)
>       at 
> org.apache.curator.framework.recipes.shared.SharedValue.readValue(SharedValue.java:244)
>       at 
> org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:177)
>       at 
> org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:111)
>       at 
> org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter.getAndIncrement(ZooKeeperCheckpointIDCounter.java:121)
>       at 
> org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:201)
>       at 
> org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:178)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 375.259 sec - 
> in org.apache.flink.runtime.operators.sort.ExternalSortLargeRecordsITCase
> Results :
> Tests in error: 
>   
> CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase>CheckpointIDCounterTest.testConcurrentGetAndIncrement:129
>  » Execution
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to