"How is scaling down different from killing instances?" I found 'killTasks' syntax too different and way much more powerful to be used for scaling in. The TaskQuery allows killing instances across jobs/roles, whereas 'scaleIn' is narrowed down to just a single job. Additional benefit: it can be ACLed independently by allowing external process kill tasks only within a given job. We may also add rate limiting or backoff to it later.
As for Joshua's question, I feel it should be an operator's responsibility to diff a job with its aurora config before applying an update. That said, if there is enough demand we can definitely consider adding something similar to what George suggested or resurrecting a 'large change' warning message we used to have in client updater. On Thu, Jan 14, 2016 at 12:06 PM, George Sirois <geo...@tellapart.com> wrote: > As a point of reference, we solved this problem by adding a binding helper > that queries the scheduler for the current number of instances and uses > that number instead of a hardcoded config: > > instances='{{scaling_instances[60]}}' > > In this example, instances will be set to the currently running number > (unless there are none, in which case 60 instances will be created). > > On Thu, Jan 14, 2016 at 2:44 PM, Joshua Cohen <jco...@apache.org> wrote: > >> What happens if a job has been scaled out, but the underlying config is not >> updated to take that scaling into account? Would the next update on that >> job revert the number of instances (presumably, because what else could we >> do)? Is there anything we can do, tooling-wise, to improve upon this? >> >> On Thu, Jan 14, 2016 at 1:40 PM, Maxim Khutornenko <ma...@apache.org> >> wrote: >> >> > Our rolling update APIs can be quite inconvenient to work with when it >> > comes to instance scaling [1]. It's especially frustrating when >> > adding/removing instances has to be done in an automated fashion (e.g.: >> by >> > an external autoscaling process) as it requires holding on to the >> original >> > aurora config at all times. >> > >> > I propose we add simple instance scaling APIs to address the above. Since >> > Aurora job may have instances at different configs at any moment, I >> propose >> > we accept an InstanceKey as a reference point when scaling out. For >> > example: >> > >> > /** Scales out a given job by adding more instances with the task >> > config of the templateKey. */ >> > Response scaleOut(1: InstanceKey templateKey, 2: i32 incrementCount) >> > >> > /** Scales in a given job by removing existing instances. */ >> > Response scaleIn(1: JobKey job, 2: i32 decrementCount) >> > >> > A correspondent client command could then look like: >> > >> > aurora job scale-out devcluster/vagrant/test/hello/1 10 >> > >> > For the above command, a scheduler would take task config of instance 1 >> of >> > the 'hello' job and replicate it 10 more times thus adding 10 additional >> > instances to the job. >> > >> > There are, of course, some details to work out like making sure no active >> > update is in flight, scale out does not violate quota and etc. I intend >> to >> > address those during the implementation as things progress. >> > >> > Does the above make sense? Any concerns/suggestions? >> > >> > Thanks, >> > Maxim >> > >> > [1] - https://issues.apache.org/jira/browse/AURORA-1258 >> > >>