Hi David, Thanks for your quick comments. Please see my response inline.
On Fri, Aug 12, 2016 at 8:03 AM, David McLaughlin <dmclaugh...@apache.org> wrote: > Hi Min, > > I'd prefer to add support for ad-hoc jobs to startJobUpdate and completely > remove the notion of job create. > > There are a few reasons that we can not really use startJobUpdate API for adhoc jobs with heterogenous tasks: (1) The tasks in an adhoc job could have been finished way before startJobUpdate call is invoked or applied to a task instance. This can always be racy and nondeterministic unless we create a job in paused mode right away. (2) If we want to launch a job with N different task configs, then we will have to invoke StartJobUpdate N times which is very expensive if N is very large. > " Also, even the > > StartJobUpdate API is not scalable to a job with 10K ~ 100K task > instances > > and each instance has different task config since we will have to invoke > > StartJobUpdate for each distinct task config." > > > What is the use case for that? Aurora was designed to have those as > separate jobs. > > We have considered to create one job for each task config. However, this solution has a few limitations too: (1) There will be a spam of jobs with each only have one task instance. This will make the Aurora UI almost unusable. Also, the user will have to track the grouping of jobs and introduce a "job group" like concept on the client side. (2) It will be more expensive for Aurora to handle N job creation calls rather than a single job create call with N task instances. We could introduce a batch API for CreateJob but that might not be simple either. (3) Our use case would like cancel the whole job if the number of task instances failing exceeds a threshold for more efficient resource usage. This will be very difficult to support if we create one job for each task config. Thanks, - Min > Thanks, > David > > On Thu, Aug 11, 2016 at 2:56 PM, Min Cai <min...@gmail.com> wrote: > > > Hey fellow Aurora team: > > > > We would like to propose a simple and backwards compatible feature in > > CreateJob API so that we can support instance-specific TaskConfigs. The > use > > case here is for an Adhoc job which has different resource settings as > well > > as different command line arguments for each task instance. Aurora today > > already support heterogenous tasks for the same job via StartJobUpdate > API, > > i.e. we can update the job instances to use different task configs. This > > works reasonably well for long running tasks like Services. However, it > is > > not feasible for Adhoc jobs where each task will finish right away before > > we even have a chance to invoke StartJobUpdate. Also, even the > > StartJobUpdate API is not scalable to a job with 10K ~ 100K task > instances > > and each instance has different task config since we will have to invoke > > StartJobUpdate for each distinct task config. > > > > The proposal we have is to add an optional field in JobConfiguration for > > instance specific task config. It will be override the default task > config > > for given instance ID ranges if specific. Otherwise, everything will be > > backwards compatibility as current API. The implementation of this change > > also seems to be very simple. We only need to plumb instance specific > tasks > > configs when we call statemanager.insertPendingTasks in > > SchedulerThriftInterface.createJob function. > > > > /** > > * Description of an Aurora job. One task will be scheduled for each > > instance within the job. > > */ > > @@ -328,13 +343,17 @@ struct JobConfiguration { > > 4: string cronSchedule > > /** Collision policy to use when handling overlapping cron runs. > > Default is KILL_EXISTING. */ > > 5: CronCollisionPolicy cronCollisionPolicy > > - /** Task configuration for this job. */ > > + /** Default task configuration for all instances of this job. */ > > 6: TaskConfig taskConfig > > /** > > * The number of instances in the job. Generated instance IDs for > tasks > > will be in the range > > * [0, instances). > > */ > > 8: i32 instanceCount > > + /** > > + * The instance specific task configs that override the default task > > config for given > > + * instanceId ranges. > > + */ > > + 10: optional set<InstanceTaskConfig> instanceTaskConfigs > > } > > > > Please let us know your comments and suggestions. > > > > Thanks, - Min > > >