[
https://issues.apache.org/jira/browse/AURORA-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zameer Manji updated AURORA-1580:
---------------------------------
Description:
I have discovered the following exception from a scheduler that is running off
master with the beta task store enabled.
{noformat}
E0113 22:51:55.941 [AsyncProcessor-2, AsyncUtil:123]
java.util.concurrent.ExecutionException: java.util.NoSuchElementException
java.util.concurrent.ExecutionException: java.util.NoSuchElementException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
~[na:1.8.0_66-Tw8r9b1]
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
~[na:1.8.0_66-Tw8r9b1]
at
org.apache.aurora.scheduler.base.AsyncUtil.evaluateResult(AsyncUtil.java:118)
[aurora-110.jar:na]
at
org.apache.aurora.scheduler.base.AsyncUtil.access$000(AsyncUtil.java:32)
[aurora-110.jar:na]
at
org.apache.aurora.scheduler.base.AsyncUtil$1.afterExecute(AsyncUtil.java:59)
[aurora-110.jar:na]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150)
[na:1.8.0_66-Tw8r9b1]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_66-Tw8r9b1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66-Tw8r9b1]
Caused by: java.util.NoSuchElementException: null
at com.google.common.collect.Iterables.getLast(Iterables.java:784)
~[guava-19.0.jar:na]
at
org.apache.aurora.scheduler.base.Tasks.getLatestEvent(Tasks.java:149)
~[aurora-110.jar:na]
at org.apache.aurora.scheduler.base.Tasks$1.apply(Tasks.java:156)
~[aurora-110.jar:na]
at org.apache.aurora.scheduler.base.Tasks$1.apply(Tasks.java:153)
~[aurora-110.jar:na]
at
com.google.common.collect.ByFunctionOrdering.compare(ByFunctionOrdering.java:45)
~[guava-19.0.jar:na]
at java.util.TimSort.binarySort(TimSort.java:296) ~[na:1.8.0_66-Tw8r9b1]
at java.util.TimSort.sort(TimSort.java:239) ~[na:1.8.0_66-Tw8r9b1]
at java.util.Arrays.sort(Arrays.java:1438) ~[na:1.8.0_66-Tw8r9b1]
at com.google.common.collect.Ordering.sortedCopy(Ordering.java:860)
~[guava-19.0.jar:na]
at
org.apache.aurora.scheduler.pruning.TaskHistoryPruner.lambda$registerInactiveTask$20(TaskHistoryPruner.java:156)
~[aurora-110.jar:na]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_66-Tw8r9b1]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_66-Tw8r9b1]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
~[na:1.8.0_66-Tw8r9b1]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
~[na:1.8.0_66-Tw8r9b1]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_66-Tw8r9b1]
... 2 common frames omitted
{noformat}
Similar exception occurs within the preemptor and causes the scheduler to crash:
{noformat}
E0113 01:43:06.242 THREAD5037
com.google.common.util.concurrent.ServiceManager$ServiceListener.failed:
Service PreemptorService [FAILED] has failed in the RUNNING state.
java.util.NoSuchElementException
at com.google.common.collect.Iterables.getLast(Iterables.java:784)
at org.apache.aurora.scheduler.base.Tasks.getLatestEvent(Tasks.java:149)
at
org.apache.aurora.scheduler.preemptor.PendingTaskProcessor$3.apply(PendingTaskProcessor.java:237)
at
org.apache.aurora.scheduler.preemptor.PendingTaskProcessor$3.apply(PendingTaskProcessor.java:234)
at
com.google.common.base.Predicates$AndPredicate.apply(Predicates.java:374)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:43)
at com.google.common.collect.Iterators.addAll(Iterators.java:364)
at com.google.common.collect.Iterables.addAll(Iterables.java:352)
at com.google.common.collect.HashMultiset.create(HashMultiset.java:66)
at
org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.fetchIdlePendingGroups(PendingTaskProcessor.java:183)
at
org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$140(PendingTaskProcessor.java:144)
at
org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:138)
at
org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
at
org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
at
org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:570)
at
org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113)
at
org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:119)
at
org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService.runOneIteration(PreemptorModule.java:145)
at
com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}
Background:
I discovered this while running a load test on a small test cluster. This
exception kept on occurring, preventing any tasks from being pruned from the
dbtask store. This exception also kept on killing the preemptor, causing a new
scheduler to be elected. Eventually the number of tasks stored for this job
reached 8k+ and there was a general slowdown observed across the entire system.
After a certain point, the scheduler wasn't able to register within the 1minute
registration timeout, causing the entire cluster to stop working until I raised
the timeout to 5 minutes, and then killed the job preventing more tasks from
being created.
was:
I have discovered the following exception from a scheduler that is running off
master with the beta task store enabled.
{noformat}
E0113 22:51:55.941 [AsyncProcessor-2, AsyncUtil:123]
java.util.concurrent.ExecutionException: java.util.NoSuchElementException
java.util.concurrent.ExecutionException: java.util.NoSuchElementException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
~[na:1.8.0_66-Tw8r9b1]
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
~[na:1.8.0_66-Tw8r9b1]
at
org.apache.aurora.scheduler.base.AsyncUtil.evaluateResult(AsyncUtil.java:118)
[aurora-110.jar:na]
at
org.apache.aurora.scheduler.base.AsyncUtil.access$000(AsyncUtil.java:32)
[aurora-110.jar:na]
at
org.apache.aurora.scheduler.base.AsyncUtil$1.afterExecute(AsyncUtil.java:59)
[aurora-110.jar:na]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150)
[na:1.8.0_66-Tw8r9b1]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_66-Tw8r9b1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66-Tw8r9b1]
Caused by: java.util.NoSuchElementException: null
at com.google.common.collect.Iterables.getLast(Iterables.java:784)
~[guava-19.0.jar:na]
at
org.apache.aurora.scheduler.base.Tasks.getLatestEvent(Tasks.java:149)
~[aurora-110.jar:na]
at org.apache.aurora.scheduler.base.Tasks$1.apply(Tasks.java:156)
~[aurora-110.jar:na]
at org.apache.aurora.scheduler.base.Tasks$1.apply(Tasks.java:153)
~[aurora-110.jar:na]
at
com.google.common.collect.ByFunctionOrdering.compare(ByFunctionOrdering.java:45)
~[guava-19.0.jar:na]
at java.util.TimSort.binarySort(TimSort.java:296) ~[na:1.8.0_66-Tw8r9b1]
at java.util.TimSort.sort(TimSort.java:239) ~[na:1.8.0_66-Tw8r9b1]
at java.util.Arrays.sort(Arrays.java:1438) ~[na:1.8.0_66-Tw8r9b1]
at com.google.common.collect.Ordering.sortedCopy(Ordering.java:860)
~[guava-19.0.jar:na]
at
org.apache.aurora.scheduler.pruning.TaskHistoryPruner.lambda$registerInactiveTask$20(TaskHistoryPruner.java:156)
~[aurora-110.jar:na]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_66-Tw8r9b1]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_66-Tw8r9b1]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
~[na:1.8.0_66-Tw8r9b1]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
~[na:1.8.0_66-Tw8r9b1]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_66-Tw8r9b1]
... 2 common frames omitted
{noformat}
Similar exception occurs within the preemptor and causes the scheduler to crash:
{noformat}
E0113 01:43:06.242 THREAD5037
com.google.common.util.concurrent.ServiceManager$ServiceListener.failed:
Service PreemptorService [FAILED] has failed in the RUNNING state.
java.util.NoSuchElementException
at com.google.common.collect.Iterables.getLast(Iterables.java:784)
at org.apache.aurora.scheduler.base.Tasks.getLatestEvent(Tasks.java:149)
at
org.apache.aurora.scheduler.preemptor.PendingTaskProcessor$3.apply(PendingTaskProcessor.java:237)
at
org.apache.aurora.scheduler.preemptor.PendingTaskProcessor$3.apply(PendingTaskProcessor.java:234)
at
com.google.common.base.Predicates$AndPredicate.apply(Predicates.java:374)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:43)
at com.google.common.collect.Iterators.addAll(Iterators.java:364)
at com.google.common.collect.Iterables.addAll(Iterables.java:352)
at com.google.common.collect.HashMultiset.create(HashMultiset.java:66)
at
org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.fetchIdlePendingGroups(PendingTaskProcessor.java:183)
at
org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$140(PendingTaskProcessor.java:144)
at
org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:138)
at
org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
at
org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
at
org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:570)
at
org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113)
at
org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:119)
at
org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService.runOneIteration(PreemptorModule.java:145)
at
com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}
> java.util.NoSuchElementException from Tasks.getLatestEvent with DbTaskStore
> ---------------------------------------------------------------------------
>
> Key: AURORA-1580
> URL: https://issues.apache.org/jira/browse/AURORA-1580
> Project: Aurora
> Issue Type: Bug
> Components: Scheduler
> Reporter: Zameer Manji
>
> I have discovered the following exception from a scheduler that is running
> off master with the beta task store enabled.
> {noformat}
> E0113 22:51:55.941 [AsyncProcessor-2, AsyncUtil:123]
> java.util.concurrent.ExecutionException: java.util.NoSuchElementException
> java.util.concurrent.ExecutionException: java.util.NoSuchElementException
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> ~[na:1.8.0_66-Tw8r9b1]
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> ~[na:1.8.0_66-Tw8r9b1]
> at
> org.apache.aurora.scheduler.base.AsyncUtil.evaluateResult(AsyncUtil.java:118)
> [aurora-110.jar:na]
> at
> org.apache.aurora.scheduler.base.AsyncUtil.access$000(AsyncUtil.java:32)
> [aurora-110.jar:na]
> at
> org.apache.aurora.scheduler.base.AsyncUtil$1.afterExecute(AsyncUtil.java:59)
> [aurora-110.jar:na]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150)
> [na:1.8.0_66-Tw8r9b1]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_66-Tw8r9b1]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66-Tw8r9b1]
> Caused by: java.util.NoSuchElementException: null
> at com.google.common.collect.Iterables.getLast(Iterables.java:784)
> ~[guava-19.0.jar:na]
> at
> org.apache.aurora.scheduler.base.Tasks.getLatestEvent(Tasks.java:149)
> ~[aurora-110.jar:na]
> at org.apache.aurora.scheduler.base.Tasks$1.apply(Tasks.java:156)
> ~[aurora-110.jar:na]
> at org.apache.aurora.scheduler.base.Tasks$1.apply(Tasks.java:153)
> ~[aurora-110.jar:na]
> at
> com.google.common.collect.ByFunctionOrdering.compare(ByFunctionOrdering.java:45)
> ~[guava-19.0.jar:na]
> at java.util.TimSort.binarySort(TimSort.java:296)
> ~[na:1.8.0_66-Tw8r9b1]
> at java.util.TimSort.sort(TimSort.java:239) ~[na:1.8.0_66-Tw8r9b1]
> at java.util.Arrays.sort(Arrays.java:1438) ~[na:1.8.0_66-Tw8r9b1]
> at com.google.common.collect.Ordering.sortedCopy(Ordering.java:860)
> ~[guava-19.0.jar:na]
> at
> org.apache.aurora.scheduler.pruning.TaskHistoryPruner.lambda$registerInactiveTask$20(TaskHistoryPruner.java:156)
> ~[aurora-110.jar:na]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_66-Tw8r9b1]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_66-Tw8r9b1]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> ~[na:1.8.0_66-Tw8r9b1]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> ~[na:1.8.0_66-Tw8r9b1]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_66-Tw8r9b1]
> ... 2 common frames omitted
> {noformat}
> Similar exception occurs within the preemptor and causes the scheduler to
> crash:
> {noformat}
> E0113 01:43:06.242 THREAD5037
> com.google.common.util.concurrent.ServiceManager$ServiceListener.failed:
> Service PreemptorService [FAILED] has failed in the RUNNING state.
> java.util.NoSuchElementException
> at com.google.common.collect.Iterables.getLast(Iterables.java:784)
> at
> org.apache.aurora.scheduler.base.Tasks.getLatestEvent(Tasks.java:149)
> at
> org.apache.aurora.scheduler.preemptor.PendingTaskProcessor$3.apply(PendingTaskProcessor.java:237)
> at
> org.apache.aurora.scheduler.preemptor.PendingTaskProcessor$3.apply(PendingTaskProcessor.java:234)
> at
> com.google.common.base.Predicates$AndPredicate.apply(Predicates.java:374)
> at
> com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:43)
> at com.google.common.collect.Iterators.addAll(Iterators.java:364)
> at com.google.common.collect.Iterables.addAll(Iterables.java:352)
> at com.google.common.collect.HashMultiset.create(HashMultiset.java:66)
> at
> org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.fetchIdlePendingGroups(PendingTaskProcessor.java:183)
> at
> org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$140(PendingTaskProcessor.java:144)
> at
> org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:138)
> at
> org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
> at
> org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
> at
> org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:570)
> at
> org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113)
> at
> org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:119)
> at
> org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService.runOneIteration(PreemptorModule.java:145)
> at
> com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
> at
> com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Background:
> I discovered this while running a load test on a small test cluster. This
> exception kept on occurring, preventing any tasks from being pruned from the
> dbtask store. This exception also kept on killing the preemptor, causing a
> new scheduler to be elected. Eventually the number of tasks stored for this
> job reached 8k+ and there was a general slowdown observed across the entire
> system. After a certain point, the scheduler wasn't able to register within
> the 1minute registration timeout, causing the entire cluster to stop working
> until I raised the timeout to 5 minutes, and then killed the job preventing
> more tasks from being created.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)