[ 
https://issues.apache.org/jira/browse/AURORA-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle closed AURORA-1993.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 0.17.0

This was fixed in 0.17.0 
https://github.com/apache/aurora/commit/4797dfe33ba08183fa9596a46ac8be51a64e08bb

> Aurora crashes when handling an unknown custom resource
> -------------------------------------------------------
>
>                 Key: AURORA-1993
>                 URL: https://issues.apache.org/jira/browse/AURORA-1993
>             Project: Aurora
>          Issue Type: Bug
>    Affects Versions: 0.16.0
>            Reporter: Clément Michaud
>            Priority: Major
>             Fix For: 0.17.0
>
>
> While we tried to declare network bandwidth as a custom resource in Mesos, we 
> faced a crash in Aurora with the following stacktrace:
> {code:java}
> Jul 18, 2018 1:35:19 PM 
> com.google.common.util.concurrent.ServiceManager$ServiceListener failed
> SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING 
> state.
> java.lang.NullPointerException: Unknown Mesos resource: name: 
> "network_bandwidth"
> type: SCALAR
> scalar {
> value: 2000.0
> }
> role: "*"
> 11: "\n\adefault"
> at java.util.Objects.requireNonNull(Objects.java:228)
> at 
> org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
> at 
> org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
> at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
> at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at java.util.Iterator.forEachRemaining(Iterator.java:115)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
> at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
> at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
> at 
> org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
> at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130)
> at 
> com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
> at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> E0718 13:35:19.240 [SlotSizeCounterService RUNNING, 
> GuavaUtils$LifecycleShutdownListener:55] Service: SlotSizeCounterService 
> [FAILED] faile
> I0718 13:35:19.240 [SlotSizeCounterService RUNNING, Lifecycle:84] Shutting 
> down application
> I0718 13:35:19.240 [SlotSizeCounterService RUNNING, 
> ShutdownRegistry$ShutdownRegistryImpl:77] Executing 4 shutdown commands.
> I0718 13:35:19.243 [SlotSizeCounterService RUNNING, StateMachine$Builder:389] 
> SchedulerLifecycle state machine transition ACTIVE -> DEAD
> I0718 13:35:19.249073 331 sched.cpp:2021] Asked to stop the driver
> I0718 13:35:19.249344 30748 sched.cpp:1203] Stopping framework 
> 2a905643-b76f-4f17-a406-524d406f49f8-0000
> I0718 13:35:19.249 [SlotSizeCounterService RUNNING, StateMachine$Builder:389] 
> storage state machine transition READY -> STOPPED
> I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$6:267] Driver 
> exited, terminating lifecycle.
> I0718 13:35:19.250 [BlockingDriverJoin, StateMachine$Builder:389] 
> SchedulerLifecycle state machine transition DEAD -> DEAD
> I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$7:287] Shutdown 
> already invoked, ignoring extra call.
> I0718 13:35:19.255 [CronLifecycle STOPPING, CronLifecycle:90] Shutting down 
> Quartz cron scheduler.
> I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:694] Scheduler 
> QuartzScheduler_$_aurora-cron-1 shutting down.
> I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:613] Scheduler 
> QuartzScheduler_$_aurora-cron-1 paused.
> I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:771] Scheduler 
> QuartzScheduler_$_aurora-cron-1 shutdown complete.
> E0718 13:35:19.945 [AsyncProcessor-0, AsyncUtil:159] 
> java.util.concurrent.ExecutionException: java.lang.IllegalStateException: 
> Driver is no
> {code}
> It would be great if Aurora was able to handle custom resources or at least 
> not crash.
> We are using version 0.16.0.
>  
> https://mesos.slack.com/archives/C1KR1PRP1/p1532013001000626



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to