[
https://issues.apache.org/jira/browse/AURORA-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360689#comment-15360689
]
John Sirois commented on AURORA-1729:
-------------------------------------
It may be possible to use the {{ServiceManagerIface}} intermediary to the guava
{{ServiceManager}} to hook {{startAsync}} and {{stopAsync}} to handle
dependencies. This could be done simply using the {{Multibinder}} guaranty of
Set iteration order reflecting binding order in combination with a serialized
start and stop (stop would use a reversed set iteration order). It could be
done more optimally by introducing an interface for Services to implement that
exposes dependencies allowing for parallel start and stop when possible with
dependency reduction where blocking is needed.
> Unclean curator teardown when scheduler fails a log write.
> ----------------------------------------------------------
>
> Key: AURORA-1729
> URL: https://issues.apache.org/jira/browse/AURORA-1729
> Project: Aurora
> Issue Type: Bug
> Components: Scheduler, Service Discovery
> Reporter: John Sirois
> Assignee: John Sirois
>
> As discovered by [~StephanErb]
> [here|https://www.irccloud.com/pastebin/E9ExthBO/]:
> {noformat}
> W0701 12:14:53.070 [Thread-11, GuiceUtils$4:163] Trapped uncaught exception:
> org.apache.aurora.scheduler.storage.Storage$StorageException: There was a
> problem committing the transaction to the log.
> org.apache.aurora.scheduler.storage.Storage$StorageException: There was a
> problem committing the transaction to the log.
> at
> org.apache.aurora.scheduler.storage.log.LogStorage.lambda$doInTransaction$85(LogStorage.java:524)
> ~[aurora-0.14.0.jar:na]
> at
> org.apache.aurora.scheduler.storage.log.LogStorage$$Lambda$184/796968060.apply(Unknown
> Source) ~[na:na]
> at
> org.apache.aurora.scheduler.storage.db.DbStorage.transactionedWrite(DbStorage.java:161)
> ~[aurora-0.14.0.jar:na]
> at
> org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
> ~[mybatis-guice-3.7.jar:3.7]
> ...skipping...
> I0701 12:15:08.627 [Thread-11, StateMachine$Builder:389] SchedulerLifecycle
> state machine transition LEADER_AWAITING_REGISTRATION -> DEAD
> I0701 12:15:08.632922 10792 sched.cpp:1907] Asked to stop the driver
> I0701 12:15:08.633 [Thread-11, StateMachine$Builder:389] storage state
> machine transition READY -> STOPPED
> I0701 12:15:08.636 [Curator-Framework-0, CuratorFrameworkImpl:821]
> backgroundOperationsLoop exiting
> I0701 12:15:08.657 [main-EventThread, ClientCnxn$EventThread:519] EventThread
> shut down for session: 0x253eae5a8df106a
> I0701 12:15:08.658 [Thread-11, ZooKeeper:684] Session: 0x253eae5a8df106a
> closed
> I0701 12:15:08.660 [main, SchedulerMain:98] Stopping scheduler services.
> E0701 12:15:08.661 [Curator-PathChildrenCache-0, PathChildrenCache:571]
> java.lang.IllegalStateException: instance must be started before calling this
> method
> at
> com.google.common.base.Preconditions.checkState(Preconditions.java:174)
> ~[guava-19.0.jar:na]
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.getChildren(CuratorFrameworkImpl.java:391)
> ~[curator-framework-2.10.0.jar:na]
> at
> org.apache.curator.framework.recipes.cache.PathChildrenCache.refresh(PathChildrenCache.java:508)
> ~[curator-recipes-2.10.0.jar:na]
> at
> org.apache.curator.framework.recipes.cache.RefreshOperation.invoke(RefreshOperation.java:35)
> ~[curator-recipes-2.10.0.jar:na]
> at
> org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:772)
> ~[curator-recipes-2.10.0.jar:na]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_45-internal]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [na:1.8.0_45-internal]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_45-internal]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [na:1.8.0_45-internal]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_45-internal]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_45-internal]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45-internal]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)