Maxim Khutornenko created AURORA-1549:
-----------------------------------------
Summary: Updater kills instances with scoped update
Key: AURORA-1549
URL: https://issues.apache.org/jira/browse/AURORA-1549
Project: Aurora
Issue Type: Bug
Components: Scheduler
Reporter: Maxim Khutornenko
Assignee: Maxim Khutornenko
Consider the following sequence for the hello_world job with 3 instances:
{noformat}
aurora job create devcluster/www-data/prod/hello
aurora/examples/jobs/hello_world.aurora
<change config to trigger update, e.g. change RAM>
aurora update start devcluster/www-data/prod/hello/0
aurora/examples/jobs/hello_world.aurora
aurora job kill devcluster/www-data/prod/hello/1
aurora update start devcluster/www-data/prod/hello/0,1
aurora/examples/jobs/hello_world.aurora
{noformat}
The expectation is to have all 3 instances on the same config. The result:
instance 0 is killed with only instances 1 and 2 remaining.
The problem is that
[UpdateFactory|https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java#L95-L101]
iterates over scoped instances thus overriding the JobDiff results. This leads
to
[InstanceUpdater|https://github.com/apache/aurora/blob/d7a1619fa85195937e74d1b09594909f0ed0ffd5/src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java#L102-L107]
killing any instances that are present in actual state but not present in the
desired state.
These are the (correct) results produced by the
[JobDiff|https://github.com/apache/aurora/blob/2e2371481d9aaccd6a45ad0f442d963d5ae7a3c8/src/main/java/org/apache/aurora/scheduler/updater/JobDiff.java#L185-L202]
that should be used to drive the update instead:
{noformat}
"Unscoped diff contents:"
Replaced: [2]
Replacements: [1, 2]
Unchanged: [0]
"Scoped (final) diff contents:"
Replaced: []
Replacements: [1]
Unchanged: [2, 0]
{noformat}
The current behavior appears to be a leftover that should have been removed in
this refactoring: https://reviews.apache.org/r/25969/.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)