On Mon, Aug 15, 2016 at 2:49 PM, David McLaughlin <dmclaugh...@apache.org> wrote:
> On Mon, Aug 15, 2016 at 10:40 AM, Mauricio Garavaglia < > mauriciogaravag...@gmail.com> wrote: > > > Hi, > > > > WRT the constraint use case: > > > > Let's say I have two instances of a service, these instances would need > > different Docker arguments: > > > > But that's less about using Docker and more how you're using Docker in > particular? Also the features of your executor? For example, if you're > using Thermos you can access the instance id in your process definition. > > For example, using the ceph rbd volume plugin [1] to implement how each instance stores the data. Or the journald log driver [2] to centralize the logs related to an instance later in logstash. I agree that those are not the trivial docker use case examples, but I don't think they are way off the limits of what we should be able to do. Regarding accessing the instance id in the executor. When thermos starts it's a bit late as the container is already created. [1] http://ceph.com/planet/getting-started-with-the-docker- rbd-volume-plugin/ [2] https://docs.docker.com/engine/admin/logging/journald/ > > > > > > - Labels. Log drivers uses container labels to identify who is producing > > this logs. The container name doesn't work with Mesos of course. So you > > have loglabel=serviceA-instance-0 and the other instance has > > loglabel=serviceA-instance-1. > > - Volumes. Each instance must read/write on its own volume. In the > similar > > way each aurora instance writes to its own distributed log instance. > > - Ip addresses, etc. > > > > That means having separated jobs would be the thing to do right now. But > if > > for HA reasons we don't want to let aurora schedule them on the same > rack, > > the solution would be to add constraints on both jobs manually assigning > > them to a predefined set of racks. > > > > Coping with constraints is not the only use case; the same goes for > rolling > > updates. Having different jobs means that the rolling update process > needs > > to be manually implemented on top of the aurora api instead of using the > > one that is provided out of the box. > > > > > > > > > > On Mon, Aug 15, 2016 at 2:05 PM, Maxim Khutornenko <ma...@apache.org> > > wrote: > > > > > I would love to hear more about constraint use cases that don't work > > across > > > jobs to see if/how we can extend Aurora to support them. > > > > > > As far as heterogeneous jobs go, that effort would require rethinking > > quite > > > a few assumptions around fundamental Aurora principles to ensure we > don't > > > lock ourselves into the corner wrt future features by accepting an > "easy > > to > > > do" change short-term. I am -1 on supporting anything specific for > > > adhoc jobs only. IMO, this has to be an all-or-nothing feature adding > > > support for heterogeneous jobs across the stack. > > > > > > If you guys feel strongly about this idea, please craft a high-level > > design > > > summary for the community to explore and review. > > > > > > On Sat, Aug 13, 2016 at 7:43 AM, Mauricio Garavaglia < > > > mauriciogaravag...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > We have been experimenting with the idea of having heterogeneous > tasks > > > in a > > > > job. Mainly to support different docker container configurations > (like > > > > volumes to let tasks have different storage, different labels for > > logging > > > > purposes, or ip addresses). > > > > The main reason for using this instead of separate jobs is that > > > scheduling > > > > constraints doesn't work across jobs, and we may want to have rack > > > > anti-affinity for the different instances. > > > > > > > > You can check how it works on the README in the repo [ > > > > https://github.com/medallia/aurora/tree/0.13.0-medallia]. Basically > > the > > > > job > > > > includes a list of parameters that are later interpolated in the task > > > > config during mesos task creation, so this happens at a latter time > and > > > the > > > > different values to apply to each instance are held in the config. We > > can > > > > start discussing if you think the design sounds or the feature could > be > > > > helpful and start working to move it upstream. > > > > > > > > We used StartJobUpdate to achieve the same purpose but required > > > > some gymnastics during deployment that we wanted to avoid. Regarding > > Min > > > > Cal's issue about short-lived tasks finishing before the update > starts, > > > we > > > > solved it by initially configuring all the tasks with a dummy NOP > ("no > > > > operation") process that just sits there waiting to be updated. > > > > > > > > Mauricio > > > > > > > > > > > > On Fri, Aug 12, 2016 at 3:17 PM, Min Cai <min...@gmail.com> wrote: > > > > > > > > > Thanks Maxim. Please see my previous email to David's comments for > > more > > > > > detailed response. > > > > > > > > > > On Fri, Aug 12, 2016 at 9:24 AM, Maxim Khutornenko < > ma...@apache.org > > > > > > > > wrote: > > > > > > > > > > > I am cautious about merging createJob and startJobUpdate as we > > don't > > > > > > support updates of adhoc jobs. It's logically unclear what adhoc > > job > > > > > update > > > > > > would mean as adhoc job instances are not intended to survive > > > terminal > > > > > > state. > > > > > > > > > > > > > > > > +1. Our adhoc job instances could be short-lived and finished way > > > before > > > > > StartJobUpdate calls are made to Aurora. > > > > > > > > > > > > > > > > > > > > > > Even if we decided to do so I am afraid it would not help with > the > > > > > scenario > > > > > > of creating a new heterogeneous job as the updater only supports > a > > > > single > > > > > > TaskConfig target. > > > > > > > > > > > > > > > > We will have to make N StartJobUpdate calls to update N distinct > task > > > > > configs so it will be expensive if N is large like > 10K. > > > > > > > > > > > > > > > > > > > > > > Speaking broadly, Aurora is built around the idea of homogenous > > jobs. > > > > > It's > > > > > > possible to have different task configs to support canaries and > > > update > > > > > > rolls but we treat that state as *temporary* until config > > > > reconciliation > > > > > > completes. > > > > > > > > > > > > > > > > Agreed that the homogeneous jobs are important design consideration > > for > > > > > *long-running* jobs like Services. However, most adhoc jobs are > > > > > heterogenous by nature. For example, they might need to process > > > different > > > > > input files and write to different output files. Or they might take > > > > > different parameters etc. It would be nice to extend Aurora to > > support > > > > > heterogenous tasks so that it can be used for broader use cases as > a > > > > > meta-scheduler. > > > > > > > > > > Thanks, - Min > > > > > > > > > > > > > > > > On Fri, Aug 12, 2016 at 8:03 AM, David McLaughlin < > > > > > dmclaugh...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > Hi Min, > > > > > > > > > > > > > > I'd prefer to add support for ad-hoc jobs to startJobUpdate and > > > > > > completely > > > > > > > remove the notion of job create. > > > > > > > > > > > > > > " Also, even the > > > > > > > > StartJobUpdate API is not scalable to a job with 10K ~ 100K > > task > > > > > > > instances > > > > > > > > and each instance has different task config since we will > have > > to > > > > > > invoke > > > > > > > > StartJobUpdate for each distinct task config." > > > > > > > > > > > > > > > > > > > > > What is the use case for that? Aurora was designed to have > those > > as > > > > > > > separate jobs. > > > > > > > > > > > > > > Thanks, > > > > > > > David > > > > > > > > > > > > > > On Thu, Aug 11, 2016 at 2:56 PM, Min Cai <min...@gmail.com> > > wrote: > > > > > > > > > > > > > > > Hey fellow Aurora team: > > > > > > > > > > > > > > > > We would like to propose a simple and backwards compatible > > > feature > > > > in > > > > > > > > CreateJob API so that we can support instance-specific > > > TaskConfigs. > > > > > The > > > > > > > use > > > > > > > > case here is for an Adhoc job which has different resource > > > settings > > > > > as > > > > > > > well > > > > > > > > as different command line arguments for each task instance. > > > Aurora > > > > > > today > > > > > > > > already support heterogenous tasks for the same job via > > > > > StartJobUpdate > > > > > > > API, > > > > > > > > i.e. we can update the job instances to use different task > > > configs. > > > > > > This > > > > > > > > works reasonably well for long running tasks like Services. > > > > However, > > > > > it > > > > > > > is > > > > > > > > not feasible for Adhoc jobs where each task will finish right > > > away > > > > > > before > > > > > > > > we even have a chance to invoke StartJobUpdate. Also, even > the > > > > > > > > StartJobUpdate API is not scalable to a job with 10K ~ 100K > > task > > > > > > > instances > > > > > > > > and each instance has different task config since we will > have > > to > > > > > > invoke > > > > > > > > StartJobUpdate for each distinct task config. > > > > > > > > > > > > > > > > The proposal we have is to add an optional field in > > > > JobConfiguration > > > > > > for > > > > > > > > instance specific task config. It will be override the > default > > > task > > > > > > > config > > > > > > > > for given instance ID ranges if specific. Otherwise, > everything > > > > will > > > > > be > > > > > > > > backwards compatibility as current API. The implementation of > > > this > > > > > > change > > > > > > > > also seems to be very simple. We only need to plumb instance > > > > specific > > > > > > > tasks > > > > > > > > configs when we call statemanager.insertPendingTasks in > > > > > > > > SchedulerThriftInterface.createJob function. > > > > > > > > > > > > > > > > /** > > > > > > > > * Description of an Aurora job. One task will be scheduled > > for > > > > each > > > > > > > > instance within the job. > > > > > > > > */ > > > > > > > > @@ -328,13 +343,17 @@ struct JobConfiguration { > > > > > > > > 4: string cronSchedule > > > > > > > > /** Collision policy to use when handling overlapping cron > > > runs. > > > > > > > > Default is KILL_EXISTING. */ > > > > > > > > 5: CronCollisionPolicy cronCollisionPolicy > > > > > > > > - /** Task configuration for this job. */ > > > > > > > > + /** Default task configuration for all instances of this > > job. > > > */ > > > > > > > > 6: TaskConfig taskConfig > > > > > > > > /** > > > > > > > > * The number of instances in the job. Generated instance > > IDs > > > > for > > > > > > > tasks > > > > > > > > will be in the range > > > > > > > > * [0, instances). > > > > > > > > */ > > > > > > > > 8: i32 instanceCount > > > > > > > > + /** > > > > > > > > + * The instance specific task configs that override the > > > default > > > > > task > > > > > > > > config for given > > > > > > > > + * instanceId ranges. > > > > > > > > + */ > > > > > > > > + 10: optional set<InstanceTaskConfig> instanceTaskConfigs > > > > > > > > } > > > > > > > > > > > > > > > > Please let us know your comments and suggestions. > > > > > > > > > > > > > > > > Thanks, - Min > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >