John Sirois created AURORA-1729:
-----------------------------------
Summary: Unclean curator teardown when scheduler fails a log write.
Key: AURORA-1729
URL: https://issues.apache.org/jira/browse/AURORA-1729
Project: Aurora
Issue Type: Bug
Components: Scheduler, Service Discovery
Reporter: John Sirois
As discovered by [~StephanErb]
[here|https://www.irccloud.com/pastebin/E9ExthBO/]:
{noformat}
W0701 12:14:53.070 [Thread-11, GuiceUtils$4:163] Trapped uncaught exception:
org.apache.aurora.scheduler.storage.Storage$StorageException: There was a
problem committing the transaction to the log.
org.apache.aurora.scheduler.storage.Storage$StorageException: There was a
problem committing the transaction to the log.
at
org.apache.aurora.scheduler.storage.log.LogStorage.lambda$doInTransaction$85(LogStorage.java:524)
~[aurora-0.14.0.jar:na]
at
org.apache.aurora.scheduler.storage.log.LogStorage$$Lambda$184/796968060.apply(Unknown
Source) ~[na:na]
at
org.apache.aurora.scheduler.storage.db.DbStorage.transactionedWrite(DbStorage.java:161)
~[aurora-0.14.0.jar:na]
at
org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
~[mybatis-guice-3.7.jar:3.7]
...skipping...
I0701 12:15:08.627 [Thread-11, StateMachine$Builder:389] SchedulerLifecycle
state machine transition LEADER_AWAITING_REGISTRATION -> DEAD
I0701 12:15:08.632922 10792 sched.cpp:1907] Asked to stop the driver
I0701 12:15:08.633 [Thread-11, StateMachine$Builder:389] storage state machine
transition READY -> STOPPED
I0701 12:15:08.636 [Curator-Framework-0, CuratorFrameworkImpl:821]
backgroundOperationsLoop exiting
I0701 12:15:08.657 [main-EventThread, ClientCnxn$EventThread:519] EventThread
shut down for session: 0x253eae5a8df106a
I0701 12:15:08.658 [Thread-11, ZooKeeper:684] Session: 0x253eae5a8df106a closed
I0701 12:15:08.660 [main, SchedulerMain:98] Stopping scheduler services.
E0701 12:15:08.661 [Curator-PathChildrenCache-0, PathChildrenCache:571]
java.lang.IllegalStateException: instance must be started before calling this
method
at
com.google.common.base.Preconditions.checkState(Preconditions.java:174)
~[guava-19.0.jar:na]
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.getChildren(CuratorFrameworkImpl.java:391)
~[curator-framework-2.10.0.jar:na]
at
org.apache.curator.framework.recipes.cache.PathChildrenCache.refresh(PathChildrenCache.java:508)
~[curator-recipes-2.10.0.jar:na]
at
org.apache.curator.framework.recipes.cache.RefreshOperation.invoke(RefreshOperation.java:35)
~[curator-recipes-2.10.0.jar:na]
at
org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:772)
~[curator-recipes-2.10.0.jar:na]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_45-internal]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[na:1.8.0_45-internal]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_45-internal]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[na:1.8.0_45-internal]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_45-internal]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_45-internal]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45-internal]
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)