[
https://issues.apache.org/jira/browse/AURORA-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Renan DelValle closed AURORA-1993.
----------------------------------
Resolution: Fixed
Fix Version/s: 0.17.0
This was fixed in 0.17.0
https://github.com/apache/aurora/commit/4797dfe33ba08183fa9596a46ac8be51a64e08bb
> Aurora crashes when handling an unknown custom resource
> -------------------------------------------------------
>
> Key: AURORA-1993
> URL: https://issues.apache.org/jira/browse/AURORA-1993
> Project: Aurora
> Issue Type: Bug
> Affects Versions: 0.16.0
> Reporter: Clément Michaud
> Priority: Major
> Fix For: 0.17.0
>
>
> While we tried to declare network bandwidth as a custom resource in Mesos, we
> faced a crash in Aurora with the following stacktrace:
> {code:java}
> Jul 18, 2018 1:35:19 PM
> com.google.common.util.concurrent.ServiceManager$ServiceListener failed
> SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING
> state.
> java.lang.NullPointerException: Unknown Mesos resource: name:
> "network_bandwidth"
> type: SCALAR
> scalar {
> value: 2000.0
> }
> role: "*"
> 11: "\n\adefault"
> at java.util.Objects.requireNonNull(Objects.java:228)
> at
> org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
> at
> org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
> at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at java.util.Iterator.forEachRemaining(Iterator.java:115)
> at
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
> at
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
> at
> org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
> at
> org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
> at
> org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130)
> at
> com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
> at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> E0718 13:35:19.240 [SlotSizeCounterService RUNNING,
> GuavaUtils$LifecycleShutdownListener:55] Service: SlotSizeCounterService
> [FAILED] faile
> I0718 13:35:19.240 [SlotSizeCounterService RUNNING, Lifecycle:84] Shutting
> down application
> I0718 13:35:19.240 [SlotSizeCounterService RUNNING,
> ShutdownRegistry$ShutdownRegistryImpl:77] Executing 4 shutdown commands.
> I0718 13:35:19.243 [SlotSizeCounterService RUNNING, StateMachine$Builder:389]
> SchedulerLifecycle state machine transition ACTIVE -> DEAD
> I0718 13:35:19.249073 331 sched.cpp:2021] Asked to stop the driver
> I0718 13:35:19.249344 30748 sched.cpp:1203] Stopping framework
> 2a905643-b76f-4f17-a406-524d406f49f8-0000
> I0718 13:35:19.249 [SlotSizeCounterService RUNNING, StateMachine$Builder:389]
> storage state machine transition READY -> STOPPED
> I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$6:267] Driver
> exited, terminating lifecycle.
> I0718 13:35:19.250 [BlockingDriverJoin, StateMachine$Builder:389]
> SchedulerLifecycle state machine transition DEAD -> DEAD
> I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$7:287] Shutdown
> already invoked, ignoring extra call.
> I0718 13:35:19.255 [CronLifecycle STOPPING, CronLifecycle:90] Shutting down
> Quartz cron scheduler.
> I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:694] Scheduler
> QuartzScheduler_$_aurora-cron-1 shutting down.
> I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:613] Scheduler
> QuartzScheduler_$_aurora-cron-1 paused.
> I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:771] Scheduler
> QuartzScheduler_$_aurora-cron-1 shutdown complete.
> E0718 13:35:19.945 [AsyncProcessor-0, AsyncUtil:159]
> java.util.concurrent.ExecutionException: java.lang.IllegalStateException:
> Driver is no
> {code}
> It would be great if Aurora was able to handle custom resources or at least
> not crash.
> We are using version 0.16.0.
>
> https://mesos.slack.com/archives/C1KR1PRP1/p1532013001000626
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)