[ 
https://issues.apache.org/jira/browse/FLINK-23323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402249#comment-17402249
 ] 

Till Rohrmann commented on FLINK-23323:
---------------------------------------

Looking into this issue it looks as if we are running into FLINK-22893.

{code}
21:39:19,967 [mini-cluster-io-thread-8] INFO  
org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServices [] - 
Finished cleaning up the high availability data for job 
8d8d05cfd1a5314e3379c31835060746.
21:39:19,967 [ Curator-Framework-0] ERROR 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl
 [] - Background exception was not retry-able or retry gave up
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$NoNodeException:
 KeeperErrorCode = NoNode for 
/flink/default/jobs/8d8d05cfd1a5314e3379c31835060746/leader
        at 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException.create(KeeperException.java:114)
 ~[flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
 ~[flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:792)
 ~[flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:308)
 ~[flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CreateBuilderImpl$9.performBackgroundOperation(CreateBuilderImpl.java:801)
 ~[flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.OperationAndData.callPerformBackgroundOperation(OperationAndData.java:84)
 ~[flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:965)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_282]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_282]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [?:1.8.0_282]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_282]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_282]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
21:39:19,969 [ Curator-Framework-0] WARN  
org.apache.flink.runtime.minicluster.MiniCluster             [] - Error in 
MiniCluster. Shutting the MiniCluster down.
org.apache.flink.util.FlinkException: Exception during leader election of 
DefaultDispatcherRunner occurred.
        at 
org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner.handleError(DefaultDispatcherRunner.java:199)
 [flink-runtime_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
        at 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$LeaderElectionFatalErrorHandler.onFatalError(DefaultLeaderElectionService.java:315)
 [flink-runtime_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
        at 
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver.unhandledError(ZooKeeperLeaderElectionDriver.java:297)
 [flink-runtime_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:713)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$6.apply(CuratorFrameworkImpl.java:709)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.logError(CuratorFrameworkImpl.java:708)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.handleBackgroundOperationException(CuratorFrameworkImpl.java:924)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:1001)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
 [flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_282]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_282]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [?:1.8.0_282]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_282]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_282]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: org.apache.flink.runtime.leaderelection.LeaderElectionException: 
Unhandled error in ZooKeeperLeaderElectionDriver: Background exception was not 
retry-able or retry gave up
        ... 18 more
Caused by: 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$NoNodeException:
 KeeperErrorCode = NoNode for 
/flink/default/jobs/8d8d05cfd1a5314e3379c31835060746/leader
        at 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException.create(KeeperException.java:114)
 ~[flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
 ~[flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:792)
 ~[flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:308)
 ~[flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CreateBuilderImpl$9.performBackgroundOperation(CreateBuilderImpl.java:801)
 ~[flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.OperationAndData.callPerformBackgroundOperation(OperationAndData.java:84)
 ~[flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        at 
org.apache.flink.shaded.curator4.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:965)
 ~[flink-shaded-zookeeper-3-3.4.14-13.0.jar:3.4.14-13.0]
        ... 9 more
21:39:19,973 [ Curator-Framework-0] INFO  
org.apache.flink.runtime.minicluster.MiniCluster             [] - Shutting down 
Flink Mini Cluster
21:39:19,973 [flink-akka.actor.default-dispatcher-4] INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Stopping 
TaskExecutor akka://flink/user/rpc/taskmanager_1.
21:39:19,973 [flink-akka.actor.default-dispatcher-2] INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Stopping 
TaskExecutor akka://flink/user/rpc/taskmanager_0.
21:39:19,973 [flink-akka.actor.default-dispatcher-2] INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Close 
ResourceManager connection db9ef8078dbfe684507ea68391c114ae.
21:39:19,973 [flink-akka.actor.default-dispatcher-4] INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Close 
ResourceManager connection db9ef8078dbfe684507ea68391c114ae.
21:39:19,973 [flink-akka.actor.default-dispatcher-3] INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Closing 
TaskExecutor connection 5c835ca8-f734-4724-b781-a2c5d4de44dd because: The 
TaskExecutor is shutting down.
21:39:19,973 [ Curator-Framework-0] INFO  
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint   [] - Shutting down 
rest endpoint.
21:39:19,974 [flink-akka.actor.default-dispatcher-2] INFO  
org.apache.flink.runtime.state.TaskExecutorStateChangelogStoragesManager [] - 
Shutting down TaskExecutorStateChangelogStoragesManager.
21:39:19,974 [flink-akka.actor.default-dispatcher-4] INFO  
org.apache.flink.runtime.state.TaskExecutorStateChangelogStoragesManager [] - 
Shutting down TaskExecutorStateChangelogStoragesManager.
21:39:19,977 [flink-akka.actor.default-dispatcher-6] INFO  
org.apache.flink.runtime.jobmanager.DefaultJobGraphStore     [] - Added 
JobGraph(jobId: 7a75c3344542508a40abbf20f5773ac6) to 
ZooKeeperStateHandleStore{namespace='flink/default/jobgraphs'}.
21:39:19,978 [flink-akka.actor.default-dispatcher-6] INFO  
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
Starting DefaultLeaderElectionService with 
ZooKeeperLeaderElectionDriver{leaderPath='/jobs/7a75c3344542508a40abbf20f5773ac6/leader/connection_info'}.
21:39:19,980 [Flink Queryable State Proxy Server Thread 0] ERROR 
org.apache.flink.queryablestate.network.AbstractServerHandler [] - Error while 
handling request with ID 154
org.apache.flink.queryablestate.exceptions.UnknownLocationException: Could not 
retrieve location of state=jungle of job=7a75c3344542508a40abbf20f5773ac6. 
Potential reasons are: i) the state is not ready, or ii) the job does not exist.
        at 
org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.getKvStateLookupInfo(KvStateClientProxyHandler.java:247)
 ~[classes/:?]
        at 
org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.getState(KvStateClientProxyHandler.java:164)
 ~[classes/:?]
        at 
org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.executeActionAsync(KvStateClientProxyHandler.java:131)
 ~[classes/:?]
        at 
org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.handleRequest(KvStateClientProxyHandler.java:121)
 ~[classes/:?]
        at 
org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.handleRequest(KvStateClientProxyHandler.java:63)
 ~[classes/:?]
        at 
org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.run(AbstractServerHandler.java:258)
 ~[flink-queryable-state-client-java-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_282]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_282]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_282]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_282]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
{code}

This does not explain the OOM but it explains all the subsequent test failures 
because the {{MiniCluster}} is shut down.

Since FLINK-22893 has been fixed and I couldn't reproduce the OOM, I will close 
this ticket as cannot reproduce.

> HAQueryableStateRocksDBBackendITCase failed due to heap OOM
> -----------------------------------------------------------
>
>                 Key: FLINK-23323
>                 URL: https://issues.apache.org/jira/browse/FLINK-23323
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Queryable State
>    Affects Versions: 1.14.0
>            Reporter: Xintong Song
>            Priority: Major
>              Labels: auto-deprioritized-critical, test-stability
>             Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=20195&view=logs&j=c91190b6-40ae-57b2-5999-31b869b0a7c1&t=43529380-51b4-5e90-5af4-2dccec0ef402&l=14431
> {code}
> Jul 08 21:43:22 [ERROR] Tests run: 12, Failures: 0, Errors: 9, Skipped: 1, 
> Time elapsed: 246.345 s <<< FAILURE! - in 
> org.apache.flink.queryablestate.itcases.HAQueryableStateRocksDBBackendITCase
> Jul 08 21:43:22 [ERROR] 
> testReducingState(org.apache.flink.queryablestate.itcases.HAQueryableStateRocksDBBackendITCase)
>   Time elapsed: 241.454 s  <<< ERROR!
> Jul 08 21:43:22 java.lang.OutOfMemoryError: Java heap space
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to