[
https://issues.apache.org/jira/browse/AURORA-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289729#comment-14289729
]
Maxim Khutornenko commented on AURORA-1023:
-------------------------------------------
Another nasty side-effect of having multiple active updates is failed quota
checks:
{noformat}
at
com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:150)
at
com.google.common.collect.RegularImmutableMap.checkNoConflictInBucket(RegularImmutableMap.java:104)
at
com.google.common.collect.RegularImmutableMap.<init>(RegularImmutableMap.java:70)
at
com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:254)
at com.google.common.collect.Maps.uniqueIndex(Maps.java:1166)
at com.google.common.collect.Maps.uniqueIndex(Maps.java:1140)
at
com.google.common.collect.FluentIterable.uniqueIndex(FluentIterable.java:424)
at
org.apache.aurora.scheduler.quota.QuotaManager$QuotaManagerImpl$2.apply(QuotaManager.java:202)
at
org.apache.aurora.scheduler.quota.QuotaManager$QuotaManagerImpl$2.apply(QuotaManager.java:196)
at
org.apache.aurora.scheduler.storage.mem.MemStorage$2.apply(MemStorage.java:136)
at
org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:127)
at
org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
at
org.apache.aurora.scheduler.storage.mem.MemStorage.read(MemStorage.java:133)
at
com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87)
at
org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:646)
at
org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:115)
at
org.apache.aurora.scheduler.quota.QuotaManager$QuotaManagerImpl.getQuotaInfo(QuotaManager.java:196)
at
org.apache.aurora.scheduler.quota.QuotaManager$QuotaManagerImpl.getQuotaInfo(QuotaManager.java:151)
at
org.apache.aurora.scheduler.thrift.SchedulerThriftInterface.getQuota(SchedulerThriftInterface.java:859)
{noformat}
We really need to enforce "one active update per job" invariant to avoid
patching in random places.
> Releasing the update lock trips off scheduler updater
> -----------------------------------------------------
>
> Key: AURORA-1023
> URL: https://issues.apache.org/jira/browse/AURORA-1023
> Project: Aurora
> Issue Type: Bug
> Components: Scheduler
> Reporter: Maxim Khutornenko
> Assignee: Bill Farner
> Priority: Critical
>
> Here is the faulty sequence:
> - User starts a scheduler job update and pauses while it's still in progress
> - User runs "aurora job cancel-update" command thus releasing the update lock
> - User starts a new scheduler job update
> At this point, any attempt to abort or pause an active update results in the
> following error [1]:
> {noformat}
> vagrant@vagrant-ubuntu-trusty-64:~$ aurora beta-update abort
> devcluster/www-data/prod/hello
> INFO] Aborting update for: devcluster/www-data/prod/hello
> Failed to abort update due to error:
> expected one element but was:
> <JobUpdateSummary(updateId:4b7fdc14-428f-44e4-9261-908b606f47e2,
> jobKey:JobKey(role:www-data, environment:prod, name:hello), user:UNSECURE,
> state:JobUpdateState(status:ROLLING_FORWARD,
> createdTimestampMs:1421450382234, lastModifiedTimestampMs:1421450382234)),
> JobUpdateSummary(updateId:3c9c2fa2-8e51-4c13-8440-94364205a37b,
> jobKey:JobKey(role:www-data, environment:prod, name:hello), user:UNSECURE,
> state:JobUpdateState(status:ROLL_FORWARD_PAUSED,
> createdTimestampMs:1421450304935, lastModifiedTimestampMs:1421450324055))>
> {noformat}
> The only way to recover from this state is either wait for the active job
> update to reach terminal state or force it to it by running another
> cancel-update.
> While the "cancel-update" will eventually go away with the client updater, we
> do have a problem during the migration period. A possible (though ugly)
> short-term workaround could be calling "abortJobUpdate" from the
> "releaseLock" RPC.
> [1] -
> https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java#L295-L296
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)