Maxim Khutornenko created AURORA-1600:
-----------------------------------------
Summary: Job updates with large count of instance overrides halt
scheduler perf
Key: AURORA-1600
URL: https://issues.apache.org/jira/browse/AURORA-1600
Project: Aurora
Issue Type: Bug
Components: Scheduler
Reporter: Maxim Khutornenko
Assignee: Maxim Khutornenko
Priority: Critical
We have observed a case when a user update with a large number of specified
instance overrides (updateOnlyTheseInstances) results in significant
performance deterioration to the extent of scheduler processing almost no
offers and not scheduling any pending tasks for long periods (minutes to
hours).
The culprit appears to be the {{selectInstructions}} query. It's unacceptably
slow when number of instanceConfigs and/or instance overrides approaches 100.
Since it's called inside a write lock to guide individual instance updates,
nothing else can proceed including status updates and offer activities.
I was able to replicate this in jmh. Fix is incoming.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)