Re: [PROPOSAL] Job instance scaling APIs

Maxim Khutornenko Thu, 14 Jan 2016 12:28:42 -0800

"How is scaling down different from killing instances?"

I found 'killTasks' syntax too different and way much more powerful to
be used for scaling in. The TaskQuery allows killing instances across
jobs/roles, whereas 'scaleIn' is narrowed down to just a single job.
Additional benefit: it can be ACLed independently by allowing external
process kill tasks only within a given job. We may also add rate
limiting or backoff to it later.


As for Joshua's question, I feel it should be an operator's
responsibility to diff a job with its aurora config before applying an
update. That said, if there is enough demand we can definitely
consider adding something similar to what George suggested or
resurrecting a 'large change' warning message we used to have in
client updater.

On Thu, Jan 14, 2016 at 12:06 PM, George Sirois <geo...@tellapart.com> wrote:
> As a point of reference, we solved this problem by adding a binding helper
> that queries the scheduler for the current number of instances and uses
> that number instead of a hardcoded config:
>
>    instances='{{scaling_instances[60]}}'
>
> In this example, instances will be set to the currently running number
> (unless there are none, in which case 60 instances will be created).
>
> On Thu, Jan 14, 2016 at 2:44 PM, Joshua Cohen <jco...@apache.org> wrote:
>
>> What happens if a job has been scaled out, but the underlying config is not
>> updated to take that scaling into account? Would the next update on that
>> job revert the number of instances (presumably, because what else could we
>> do)? Is there anything we can do, tooling-wise, to improve upon this?
>>
>> On Thu, Jan 14, 2016 at 1:40 PM, Maxim Khutornenko <ma...@apache.org>
>> wrote:
>>
>> > Our rolling update APIs can be quite inconvenient to work with when it
>> > comes to instance scaling [1]. It's especially frustrating when
>> > adding/removing instances has to be done in an automated fashion (e.g.:
>> by
>> > an external autoscaling process) as it requires holding on to the
>> original
>> > aurora config at all times.
>> >
>> > I propose we add simple instance scaling APIs to address the above. Since
>> > Aurora job may have instances at different configs at any moment, I
>> propose
>> > we accept an InstanceKey as a reference point when scaling out. For
>> > example:
>> >
>> >     /** Scales out a given job by adding more instances with the task
>> > config of the templateKey. */
>> >     Response scaleOut(1: InstanceKey templateKey, 2: i32 incrementCount)
>> >
>> >     /** Scales in a given job by removing existing instances. */
>> >     Response scaleIn(1: JobKey job, 2: i32 decrementCount)
>> >
>> > A correspondent client command could then look like:
>> >
>> >     aurora job scale-out devcluster/vagrant/test/hello/1 10
>> >
>> > For the above command, a scheduler would take task config of instance 1
>> of
>> > the 'hello' job and replicate it 10 more times thus adding 10 additional
>> > instances to the job.
>> >
>> > There are, of course, some details to work out like making sure no active
>> > update is in flight, scale out does not violate quota and etc. I intend
>> to
>> > address those during the implementation as things progress.
>> >
>> > Does the above make sense? Any concerns/suggestions?
>> >
>> > Thanks,
>> > Maxim
>> >
>> > [1] - https://issues.apache.org/jira/browse/AURORA-1258
>> >
>>

Re: [PROPOSAL] Job instance scaling APIs

Reply via email to