As a point of reference, we solved this problem by adding a binding helper that queries the scheduler for the current number of instances and uses that number instead of a hardcoded config:
instances='{{scaling_instances[60]}}' In this example, instances will be set to the currently running number (unless there are none, in which case 60 instances will be created). On Thu, Jan 14, 2016 at 2:44 PM, Joshua Cohen <jco...@apache.org> wrote: > What happens if a job has been scaled out, but the underlying config is not > updated to take that scaling into account? Would the next update on that > job revert the number of instances (presumably, because what else could we > do)? Is there anything we can do, tooling-wise, to improve upon this? > > On Thu, Jan 14, 2016 at 1:40 PM, Maxim Khutornenko <ma...@apache.org> > wrote: > > > Our rolling update APIs can be quite inconvenient to work with when it > > comes to instance scaling [1]. It's especially frustrating when > > adding/removing instances has to be done in an automated fashion (e.g.: > by > > an external autoscaling process) as it requires holding on to the > original > > aurora config at all times. > > > > I propose we add simple instance scaling APIs to address the above. Since > > Aurora job may have instances at different configs at any moment, I > propose > > we accept an InstanceKey as a reference point when scaling out. For > > example: > > > > /** Scales out a given job by adding more instances with the task > > config of the templateKey. */ > > Response scaleOut(1: InstanceKey templateKey, 2: i32 incrementCount) > > > > /** Scales in a given job by removing existing instances. */ > > Response scaleIn(1: JobKey job, 2: i32 decrementCount) > > > > A correspondent client command could then look like: > > > > aurora job scale-out devcluster/vagrant/test/hello/1 10 > > > > For the above command, a scheduler would take task config of instance 1 > of > > the 'hello' job and replicate it 10 more times thus adding 10 additional > > instances to the job. > > > > There are, of course, some details to work out like making sure no active > > update is in flight, scale out does not violate quota and etc. I intend > to > > address those during the implementation as things progress. > > > > Does the above make sense? Any concerns/suggestions? > > > > Thanks, > > Maxim > > > > [1] - https://issues.apache.org/jira/browse/AURORA-1258 > > >