[ 
https://issues.apache.org/jira/browse/AURORA-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277887#comment-14277887
 ] 

Zameer Manji commented on AURORA-1016:
--------------------------------------

Closer inspection leads to suspicious code in {{CachedClusterState.java}}.
{noformat}
  private final Multimap<String, PreemptionVictim> victims =
      Multimaps.synchronizedMultimap(HashMultimap.<String, 
PreemptionVictim>create());

  @Override
  public Multimap<String, PreemptionVictim> getSlavesToActiveTasks() {
    return Multimaps.unmodifiableMultimap(victims);
  }

  @Subscribe
  public void taskChangedState(TaskStateChange stateChange) {
    synchronized (victims) {
      String slaveId = stateChange.getTask().getAssignedTask().getSlaveId();
      PreemptionVictim victim = 
PreemptionVictim.fromTask(stateChange.getTask().getAssignedTask());
      if (Tasks.SLAVE_ASSIGNED_STATES.contains(stateChange.getNewState())) {
        victims.put(slaveId, victim);
      } else {
        victims.remove(slaveId, victim);
      }
    }
  }
{noformat}

In the above code the {{HashMultimap}} can have null keys which can propagate 
to the preemptor code. This can trigger the observed NPE. In order to have null 
keys it is possible we have tasks in the set of states 
{{SLAVE_ASSIGNED_STATES}} and not have a slave.

I think the first step in fixing this bug will be to enforce that invariant 
where possible.

> NullPointerException in PreemptorImpl
> -------------------------------------
>
>                 Key: AURORA-1016
>                 URL: https://issues.apache.org/jira/browse/AURORA-1016
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Zameer Manji
>            Assignee: Zameer Manji
>              Labels: twitter
>
> This appears in the logs of a scheduler that appears to not be preempting 
> tasks.
> {noformat}
> W0114 20:57:59.565 THREAD149 
> org.apache.aurora.scheduler.async.TaskScheduler$TaskSchedulerImpl.schedule: 
> Task scheduling unexpectedly
>  failed, will be retried
> java.lang.NullPointerException
>         at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213)
>         at 
> com.google.common.collect.ImmutableCollection$ArrayBasedBuilder.add(ImmutableCollection.java:339)
>         at 
> com.google.common.collect.ImmutableSet$Builder.add(ImmutableSet.java:480)
>         at 
> com.google.common.collect.ImmutableSet$Builder.add(ImmutableSet.java:456)
>         at 
> com.google.common.collect.ImmutableCollection$Builder.addAll(ImmutableCollection.java:282)
>         at 
> com.google.common.collect.ImmutableCollection$ArrayBasedBuilder.addAll(ImmutableCollection.java:360)
>         at 
> com.google.common.collect.ImmutableSet$Builder.addAll(ImmutableSet.java:508)
>         at 
> org.apache.aurora.scheduler.async.preemptor.PreemptorImpl.findPreemptionSlotFor(PreemptorImpl.java:321)
>         at 
> org.apache.aurora.scheduler.async.TaskScheduler$TaskSchedulerImpl.maybePreemptFor(TaskScheduler.java:249)
>         at 
> org.apache.aurora.scheduler.async.TaskScheduler$TaskSchedulerImpl.scheduleTask(TaskScheduler.java:220)
>         at 
> com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87)
>         at 
> org.apache.aurora.scheduler.async.TaskScheduler$TaskSchedulerImpl$3.apply(TaskScheduler.java:192)
>         at 
> org.apache.aurora.scheduler.async.TaskScheduler$TaskSchedulerImpl$3.apply(TaskScheduler.java:189)
>         at 
> org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:608)
>         at 
> org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:605)
>         at 
> org.apache.aurora.scheduler.storage.mem.MemStorage$3.apply(MemStorage.java:147)
>         at 
> org.apache.aurora.scheduler.storage.mem.MemStorage$3.apply(MemStorage.java:144)
>         at 
> org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:137)
>         at 
> org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
>         at 
> org.apache.aurora.scheduler.storage.mem.MemStorage.write(MemStorage.java:144)
>         at 
> com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87)
>         at 
> org.apache.aurora.scheduler.storage.log.LogStorage.doInTransaction(LogStorage.java:605)
>         at 
> org.apache.aurora.scheduler.storage.log.LogStorage.write(LogStorage.java:638)
>         at 
> org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.write(CallOrderEnforcingStorage.java:122)
>         at 
> org.apache.aurora.scheduler.async.TaskScheduler$TaskSchedulerImpl.schedule(TaskScheduler.java:189)
>         at 
> com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87)
>         at 
> org.apache.aurora.scheduler.async.TaskGroups$1.schedule(TaskGroups.java:136)
>         at 
> org.apache.aurora.scheduler.async.TaskGroups$2.run(TaskGroups.java:158)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to