[
https://issues.apache.org/jira/browse/AURORA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111070#comment-14111070
]
Kevin Sweeney commented on AURORA-667:
--------------------------------------
Looking at the docs for synchronizedMultimap it looks like we need to
synchronize on the multimap access, not the returned collection.
So the MVP fix would be
{code}
for (K key : keys) {
- builder.addAll(index.get(key));
+ synchronized (index) {
+ builder.addAll(index.get(key));
+ }
return build.build();
}
{code}
If we wanted a fully consistent lookup we can syncronize on index outside the
for-loop.
Of course, this is one of the many reasons we're continuing to get out of
business of implementing a database per AURORA-286.
> aurora ConcurrentModificationException if specific job is PENDING/THROTTLED
> ---------------------------------------------------------------------------
>
> Key: AURORA-667
> URL: https://issues.apache.org/jira/browse/AURORA-667
> Project: Aurora
> Issue Type: Bug
> Components: Scheduler
> Reporter: Bhuvan Arumugam
>
> I'm running into this issue when a specific job {{armijo-prod-passive-check}}
> is THROTTLED or PENDING. Other jobs when they go to THROTTLED or PENDING, we
> don't face this exception.
> Can you let me know why we face this exception on specific job? I could
> replicate it in one of my cluster. Let me know if you need aurora config.
> We are running a week old scheduler, as of this commit:
> https://github.com/apache/incubator-aurora/commit/20bb549ba3bd2fe0aeafab4275bd3b701c1b46f6
> {code}
> I0826 17:15:52.392 THREAD969679
> com.twitter.common.util.StateMachine$Builder$1.execute:
> 1409073352392-armijo-prod-passive-check-424-d8e3c9ed-4017-41b9-b495-953891b000d2
> stat
> e machine transition INIT -> THROTTLED
> I0826 17:15:52.392 THREAD969679
> org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup: Adding work
> command SAVE_STATE for 1409073352392-armijo-prod-passive-check-42
> 4-d8e3c9ed-4017-41b9-b495-953891b000d2
> E0826 17:15:52.392 THREAD125
> org.apache.aurora.scheduler.base.AsyncUtil$1.afterExecute:
> java.util.concurrent.ExecutionException:
> java.util.ConcurrentModificationException
> java.util.concurrent.ExecutionException:
> java.util.ConcurrentModificationException
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:188)
> at
> org.apache.aurora.scheduler.base.AsyncUtil$1.afterExecute(AsyncUtil.java:66)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:926)
> at java.util.HashMap$KeyIterator.next(HashMap.java:960)
> at
> com.google.common.collect.AbstractMapBasedMultimap$WrappedCollection$WrappedIterator.next(AbstractMapBasedMultimap.java:486)
> at
> com.google.common.collect.ImmutableCollection$Builder.addAll(ImmutableCollection.java:281)
> at
> com.google.common.collect.ImmutableCollection$ArrayBasedBuilder.addAll(ImmutableCollection.java:360)
> at
> com.google.common.collect.ImmutableSet$Builder.addAll(ImmutableSet.java:508)
> at
> org.apache.aurora.scheduler.storage.mem.MemTaskStore$SecondaryIndex$1.apply(MemTaskStore.java:421)
> at
> org.apache.aurora.scheduler.storage.mem.MemTaskStore$SecondaryIndex$1.apply(MemTaskStore.java:415)
> at com.google.common.base.Present.transform(Present.java:71)
> at
> org.apache.aurora.scheduler.storage.mem.MemTaskStore$SecondaryIndex.getMatches(MemTaskStore.java:428)
> at
> org.apache.aurora.scheduler.storage.mem.MemTaskStore.matches(MemTaskStore.java:292)
> at
> org.apache.aurora.scheduler.storage.mem.MemTaskStore.fetchTasks(MemTaskStore.java:122)
> at
> com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87)
> at
> org.apache.aurora.scheduler.storage.Storage$Util$2.apply(Storage.java:300)
> at
> org.apache.aurora.scheduler.storage.Storage$Util$2.apply(Storage.java:297)
> at
> org.apache.aurora.scheduler.storage.mem.MemStorage.weaklyConsistentRead(MemStorage.java:204)
> at
> com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87)
> at
> org.apache.aurora.scheduler.storage.log.LogStorage.weaklyConsistentRead(LogStorage.java:587)
> at
> org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.weaklyConsistentRead(CallOrderEnforcingStorage.java:123)
> at
> org.apache.aurora.scheduler.storage.Storage$Util.weaklyConsistentFetchTasks(Storage.java:297)
> at
> org.apache.aurora.scheduler.async.HistoryPruner$3.run(HistoryPruner.java:154)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ... 2 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)