Semi off-the-cuff thought, but one option is to re-define the instance ID -> TaskConfig association in JobConfiguration:
struct JobConfiguration { *...* 6: TaskConfig taskConfig /** * The number of instances in the job. Generated instance IDs for tasks will be in the range * [0, instances). */ 8: i32 instanceCount } Some prior art you could draw from is JobUpdateInstructions, which models a heterogeneous set of tasks (while supporting normalization): struct JobUpdateInstructions { /** Actual InstanceId -> TaskConfig mapping when the update was requested. */ 1: set<InstanceTaskConfig> initialState /** Desired configuration when the update completes. */ 2: InstanceTaskConfig desiredState ... } struct InstanceTaskConfig { /** A TaskConfig associated with instances. */ 1: TaskConfig task /** Instances associated with the TaskConfig. */ 2: set<Range> instances } So you could imagine JobConfiguration containing set<InstanceTaskConfig> to be the eventual replacement of the taskConfig, instanceCount fields. If we proceed this way, it suggests that we should change JobUpdateInstructions.desiredState to also be set<InstanceTaskConfig> for parity. On Tue, Jan 12, 2016 at 8:10 PM, Mauricio Garavaglia < mauriciogaravag...@gmail.com> wrote: > Thanks for the input guys! I was wondering if you have any thoughts about > how the API should look like. > > On Tue, Jan 12, 2016 at 1:00 PM, John Sirois <john.sir...@gmail.com> > wrote: > > > On Mon, Jan 11, 2016 at 11:02 PM, John Sirois <john.sir...@gmail.com> > > wrote: > > > > > > > > > > > On Mon, Jan 11, 2016 at 11:00 PM, Bill Farner <wfar...@apache.org> > > wrote: > > > > > >> In the log, tasks are denormalized anyhow: > > >> > > >> > > > https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/storage.thrift#L43-L45 > > > > > > > > > Right - but now we'd be making that denormalization systemically > > > in-effective. IIUC its values-equals based denorm, I'd think we'd > need > > > diffing in a cluster using, for example, ceph + docker ~exclusively. > > > > > > > I was being generally confusing here. To be more precise, the issue I'm > > concerned about is the newish log snapshot deduping feature [1] being > > foiled by all TaskConfig's for a job's tasks now being unique via > > `ExecutorConfig.data` [2]. > > This is an optimization concern only, and IIUC it only becomes of concern > > in very large clusters as evidenced by the fact the log dedup feature > came > > late in the use of Aurora by Twitter. > > > > This could definitely be worked out. > > > > [1] > > > > > https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/storage.thrift#L196-L208 > > [2] > > > > > https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L167 > > > > > > > > > > >> > > >> > > >> > > >> > > >> On Mon, Jan 11, 2016 at 9:54 PM, John Sirois <j...@conductant.com> > > wrote: > > >> > > >> > On Mon, Jan 11, 2016 at 10:40 PM, Bill Farner <wfar...@apache.org> > > >> wrote: > > >> > > > >> > > Funny, that's actually how the scheduler API originally worked. I > > >> think > > >> > > this is worth exploring, and would indeed completely sidestep the > > >> > paradigm > > >> > > shift i mentioned above. > > >> > > > > >> > > > >> > I think the crux might be handling a structural diff of the thrift > for > > >> the > > >> > Tasks to keep the log dedupe optimizations in-play for the most > part; > > ie > > >> > store Task0 in-full, and Task1-N as thrift struct diffs against 0. > > >> Maybe > > >> > something simpler like a binary diff would be enough too. > > >> > > > >> > > > >> > > > > >> > > On Mon, Jan 11, 2016 at 9:20 PM, John Sirois <j...@conductant.com > > > > >> > wrote: > > >> > > > > >> > > > On Mon, Jan 11, 2016 at 10:10 PM, Bill Farner < > wfar...@apache.org > > > > > >> > > wrote: > > >> > > > > > >> > > > > There's a chicken and egg problem though. That variable will > > only > > >> be > > >> > > > filled > > >> > > > > in on the executor, when we're already running in the docker > > >> > > environment. > > >> > > > > In this case, the parameter is used to *define* the docker > > >> > environment. > > >> > > > > > > >> > > > > > >> > > > So, from a naive standpoint, the fact that Job is exploded into > > >> Tasks > > >> > by > > >> > > > the scheduler but that explosion is not exposed to the client > > seems > > >> to > > >> > be > > >> > > > the impedance mismatch here. > > >> > > > I have not thought through this much at all, but say that > > >> fundamentally > > >> > > the > > >> > > > scheduler took a Job that was a list of Tasks - possibly > > >> heterogeneous. > > >> > > > The current Job expands to homogeneous Tasks could be just a > > >> standard > > >> > > > convenience. > > >> > > > > > >> > > > In that sort of world, the customized params could be injected > > >> client > > >> > > side > > >> > > > to form a list of heterogeneous tasks and the Scheduler could > stay > > >> > dumb - > > >> > > > at least wrt Task parameterization. > > >> > > > > > >> > > > > > >> > > > > On Mon, Jan 11, 2016 at 9:07 PM, ben...@gmail.com < > > >> ben...@gmail.com> > > >> > > > > wrote: > > >> > > > > > > >> > > > > > As a starting point, you might be able to cook up something > > >> > involving > > >> > > > > > {{mesos.instance}} as a lookup key to a pystachio list. You > > do > > >> > have > > >> > > a > > >> > > > > > unique integer task number per instance to work with. > > >> > > > > > > > >> > > > > > cf. > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > http://aurora.apache.org/documentation/latest/configuration-reference/#template-namespaces > > >> > > > > > > > >> > > > > > On Mon, Jan 11, 2016 at 8:05 PM Bill Farner < > > wfar...@apache.org > > >> > > > >> > > > wrote: > > >> > > > > > > > >> > > > > > > I agree that this appears necessary when parameters are > > >> needed to > > >> > > > > define > > >> > > > > > > the runtime environment of the task (in this case, setting > > up > > >> the > > >> > > > > docker > > >> > > > > > > container). > > >> > > > > > > > > >> > > > > > > What's particularly interesting here is that this would > call > > >> for > > >> > > the > > >> > > > > > > scheduler to fill in the parameter values prior to > launching > > >> each > > >> > > > task. > > >> > > > > > > Using pystachio variables for this is certainly the most > > >> natural > > >> > in > > >> > > > the > > >> > > > > > > DSL, but becomes a paradigm shift since the scheduler is > > >> > currently > > >> > > > > > ignorant > > >> > > > > > > of pystachio. > > >> > > > > > > > > >> > > > > > > Possibly only worth mentioning for shock value, but in the > > DSL > > >> > this > > >> > > > > > starts > > >> > > > > > > to look like lambdas pretty quickly. > > >> > > > > > > > > >> > > > > > > On Mon, Jan 11, 2016 at 7:46 PM, Mauricio Garavaglia < > > >> > > > > > > mauriciogaravag...@gmail.com> wrote: > > >> > > > > > > > > >> > > > > > > > Hi guys, > > >> > > > > > > > > > >> > > > > > > > We are using the docker rbd volume plugin > > >> > > > > > > > < > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://ceph.com/planet/getting-started-with-the-docker-rbd-volume-plugin > > >> > > > >> > > > > > > > to > > >> > > > > > > > provide persistent storage to the aurora jobs that runs > in > > >> the > > >> > > > > > > containers. > > >> > > > > > > > Something like: > > >> > > > > > > > > > >> > > > > > > > p = [Parameter(name='volume', > > value='my-ceph-volume:/foo'), > > >> > ...] > > >> > > > > > > > jobs = [ Service(..., container = Container(docker = > > >> > Docker(..., > > >> > > > > > > parameters > > >> > > > > > > > = p)))] > > >> > > > > > > > > > >> > > > > > > > But in the case of jobs with multiple instances it's > > >> required > > >> > to > > >> > > > > start > > >> > > > > > > each > > >> > > > > > > > container using different volumes, in our case different > > >> ceph > > >> > > > images. > > >> > > > > > > This > > >> > > > > > > > could be achieved by deploying, for example, 10 > instances > > >> and > > >> > > then > > >> > > > > > update > > >> > > > > > > > each one independently to use the appropiate volume. Of > > >> course > > >> > > this > > >> > > > > is > > >> > > > > > > > quite inconvenient, error prone, and adds a lot of logic > > and > > >> > > state > > >> > > > > > > outside > > >> > > > > > > > aurora. > > >> > > > > > > > > > >> > > > > > > > We where thinking if it would make sense to have a way > to > > >> > > > > parameterize > > >> > > > > > > the > > >> > > > > > > > task instances, in a similar way that is done with > > >> portmapping > > >> > > for > > >> > > > > > > example. > > >> > > > > > > > In the job definition have something like > > >> > > > > > > > > > >> > > > > > > > params = [ > > >> > > > > > > > Parameter( name='volume', > > >> > > > > > > > value='service-{{instanceParameters.volume}}:/foo' ) > > >> > > > > > > > ] > > >> > > > > > > > ... > > >> > > > > > > > jobs = [ > > >> > > > > > > > Service( > > >> > > > > > > > name = 'logstash', > > >> > > > > > > > ... > > >> > > > > > > > instanceParameters = { "volume" : ["foo", "bar", > > >> "zaa"]}, > > >> > > > > > > > instances = 3, > > >> > > > > > > > container = Container( > > >> > > > > > > > docker = Docker( > > >> > > > > > > > image = 'image', > > >> > > > > > > > parameters = params > > >> > > > > > > > ) > > >> > > > > > > > ) > > >> > > > > > > > ) > > >> > > > > > > > ] > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > Something like that, it would create 3 instances of the > > >> tasks, > > >> > > each > > >> > > > > one > > >> > > > > > > > running in a container that uses the volumes foo, bar, > and > > >> zaa. > > >> > > > > > > > > > >> > > > > > > > Does it make sense? I'd be glad to work on it but I want > > to > > >> > > > validate > > >> > > > > > the > > >> > > > > > > > idea with you first and hear comments about the > > >> > > api/implementation. > > >> > > > > > > > > > >> > > > > > > > Thanks > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > Mauricio > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > -- > > >> > > > John Sirois > > >> > > > 303-512-3301 > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > -- > > >> > John Sirois > > >> > 303-512-3301 > > >> > > > >> > > > > > > > > >