Maxim Khutornenko created AURORA-1615:
-----------------------------------------

             Summary: Preemptor crashes scheduler during host maintenance
                 Key: AURORA-1615
                 URL: https://issues.apache.org/jira/browse/AURORA-1615
             Project: Aurora
          Issue Type: Bug
          Components: Scheduler
            Reporter: Maxim Khutornenko
            Assignee: Maxim Khutornenko


We have noticed an occasional scheduler failover when host maintenance is in 
effect:
{noformat}
To index multiple values under a key, use Multimaps.index.
        at com.google.common.collect.Maps.uniqueIndex(Maps.java:1215)
        at com.google.common.collect.Maps.uniqueIndex(Maps.java:1173)
        at 
org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$224(PendingTaskProcessor.java:130)
        at 
org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:138)
        at 
org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
        at 
org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
        at 
org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:570)
        at 
org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113)
        at 
org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:119)
{noformat}

Diffing colliding HostOffer objects revealed the only difference is in 
HostAttributes maintenance mode value: 
mode=NONE vs. mode=DRAINING

Upon examination it appears that it's quite possible to have duplicate 
HostOffer instances (same offer, same slave, different maintenance mode) due to 
the way [offers are 
accessed|https://github.com/apache/aurora/blob/9ed81a7db58f6a7cb308c8ac6a545705351c8c0e/src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java#L223-L226]
 as unmodifiable view over underlying ConcurrentSkipListSet. Here is the 
possible sequence:
# Pending task processor starts [building unique 
index|https://github.com/apache/aurora/blob/2e2371481d9aaccd6a45ad0f442d963d5ae7a3c8/src/main/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java#L128-L130]
 and the offers iterator pulls OfferA with mode=None
# A host drain operation is initiated, a HostAttributesChanged event is raised
# OfferManager 
[processes|https://github.com/apache/aurora/blob/9ed81a7db58f6a7cb308c8ac6a545705351c8c0e/src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java#L243-L246]
 HostAttributeChanged event and atomically 
[swaps|https://github.com/apache/aurora/blob/9ed81a7db58f6a7cb308c8ac6a545705351c8c0e/src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java#L315-L322]
 OfferA with OfferA' (mode=DRAINING)
# iterator.next() inside of the uniqueIndex routine pulls OfferA' and the error 
is raised.

We should either copy inside a synchronized getOffers() implementation or deal 
with possible duplicates at call site. I tend to think copying on access is a 
better approach. The only consumer of getOffers() is PendingTaskProcessor  with 
a relatively infrequent run loop (1 minute), so the perf impact of making a 
copy of all offers within a synchronized method should be acceptable. The 
alternative implies leaking the abstraction of host maintenance mode into the 
preemptor, which is less than ideal. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to