[
https://issues.apache.org/jira/browse/AURORA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maxim Khutornenko updated AURORA-1600:
--------------------------------------
Sprint: Twitter Aurora Q1'16 Sprint 18
> Job updates with large count of instance overrides halt scheduler perf
> ----------------------------------------------------------------------
>
> Key: AURORA-1600
> URL: https://issues.apache.org/jira/browse/AURORA-1600
> Project: Aurora
> Issue Type: Bug
> Components: Scheduler
> Reporter: Maxim Khutornenko
> Assignee: Maxim Khutornenko
> Priority: Critical
> Fix For: 0.12.0
>
>
> We have observed a case when a user update with a large number of specified
> instance overrides (updateOnlyTheseInstances) results in significant
> performance deterioration to the extent of scheduler processing almost no
> offers and not scheduling any pending tasks for long periods (minutes to
> hours).
> The culprit appears to be the {{selectInstructions}} query. It's unacceptably
> slow when number of instanceConfigs and/or instance overrides approaches 100.
> Since it's called inside a write lock to guide individual instance updates,
> nothing else can proceed including status updates and offer activities.
> I was able to replicate this in jmh. Fix is incoming.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)