Re: Parameterize each Job Instance.

Bill Farner Mon, 11 Jan 2016 22:00:55 -0800

In the log, tasks are denormalized anyhow:
https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/storage.thrift#L43-L45




On Mon, Jan 11, 2016 at 9:54 PM, John Sirois <j...@conductant.com> wrote:

> On Mon, Jan 11, 2016 at 10:40 PM, Bill Farner <wfar...@apache.org> wrote:
>
> > Funny, that's actually how the scheduler API originally worked.  I think
> > this is worth exploring, and would indeed completely sidestep the
> paradigm
> > shift i mentioned above.
> >
>
> I think the crux might be handling a structural diff of the thrift for the
> Tasks to keep the log dedupe optimizations in-play for the most part; ie
> store Task0 in-full, and Task1-N as thrift struct diffs against 0.  Maybe
> something simpler like a binary diff would be enough too.
>
>
> >
> > On Mon, Jan 11, 2016 at 9:20 PM, John Sirois <j...@conductant.com>
> wrote:
> >
> > > On Mon, Jan 11, 2016 at 10:10 PM, Bill Farner <wfar...@apache.org>
> > wrote:
> > >
> > > > There's a chicken and egg problem though. That variable will only be
> > > filled
> > > > in on the executor, when we're already running in the docker
> > environment.
> > > > In this case, the parameter is used to *define* the docker
> environment.
> > > >
> > >
> > > So, from a naive standpoint, the fact that Job is exploded into Tasks
> by
> > > the scheduler but that explosion is not exposed to the client seems to
> be
> > > the impedance mismatch here.
> > > I have not thought through this much at all, but say that fundamentally
> > the
> > > scheduler took a Job that was a list of Tasks - possibly heterogeneous.
> > > The current Job expands to homogeneous Tasks could be just a standard
> > > convenience.
> > >
> > > In that sort of world, the customized params could be injected client
> > side
> > > to form a list of heterogeneous tasks and the Scheduler could stay
> dumb -
> > > at least wrt Task parameterization.
> > >
> > >
> > > > On Mon, Jan 11, 2016 at 9:07 PM, ben...@gmail.com <ben...@gmail.com>
> > > > wrote:
> > > >
> > > > > As a starting point, you might be able to cook up something
> involving
> > > > > {{mesos.instance}} as a lookup key to a pystachio list.  You do
> have
> > a
> > > > > unique integer task number per instance to work with.
> > > > >
> > > > > cf.
> > > > >
> > > > >
> > > >
> > >
> >
> http://aurora.apache.org/documentation/latest/configuration-reference/#template-namespaces
> > > > >
> > > > > On Mon, Jan 11, 2016 at 8:05 PM Bill Farner <wfar...@apache.org>
> > > wrote:
> > > > >
> > > > > > I agree that this appears necessary when parameters are needed to
> > > > define
> > > > > > the runtime environment of the task (in this case, setting up the
> > > > docker
> > > > > > container).
> > > > > >
> > > > > > What's particularly interesting here is that this would call for
> > the
> > > > > > scheduler to fill in the parameter values prior to launching each
> > > task.
> > > > > > Using pystachio variables for this is certainly the most natural
> in
> > > the
> > > > > > DSL, but becomes a paradigm shift since the scheduler is
> currently
> > > > > ignorant
> > > > > > of pystachio.
> > > > > >
> > > > > > Possibly only worth mentioning for shock value, but in the DSL
> this
> > > > > starts
> > > > > > to look like lambdas pretty quickly.
> > > > > >
> > > > > > On Mon, Jan 11, 2016 at 7:46 PM, Mauricio Garavaglia <
> > > > > > mauriciogaravag...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi guys,
> > > > > > >
> > > > > > > We are using the docker rbd volume plugin
> > > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://ceph.com/planet/getting-started-with-the-docker-rbd-volume-plugin>
> > > > > > > to
> > > > > > > provide persistent storage to the aurora jobs that runs in the
> > > > > > containers.
> > > > > > > Something like:
> > > > > > >
> > > > > > > p = [Parameter(name='volume', value='my-ceph-volume:/foo'),
> ...]
> > > > > > > jobs = [ Service(..., container = Container(docker =
> Docker(...,
> > > > > > parameters
> > > > > > > = p)))]
> > > > > > >
> > > > > > > But in the case of jobs with multiple instances it's required
> to
> > > > start
> > > > > > each
> > > > > > > container using different volumes, in our case different ceph
> > > images.
> > > > > > This
> > > > > > > could be achieved by deploying, for example, 10 instances and
> > then
> > > > > update
> > > > > > > each one independently to use the appropiate volume. Of course
> > this
> > > > is
> > > > > > > quite inconvenient, error prone, and adds a lot of logic and
> > state
> > > > > > outside
> > > > > > > aurora.
> > > > > > >
> > > > > > > We where thinking if it would make sense to have a way to
> > > > parameterize
> > > > > > the
> > > > > > > task instances, in a similar way that is done with portmapping
> > for
> > > > > > example.
> > > > > > > In the job definition have something like
> > > > > > >
> > > > > > > params = [
> > > > > > >   Parameter( name='volume',
> > > > > > > value='service-{{instanceParameters.volume}}:/foo' )
> > > > > > > ]
> > > > > > > ...
> > > > > > > jobs = [
> > > > > > >   Service(
> > > > > > >     name = 'logstash',
> > > > > > >     ...
> > > > > > >     instanceParameters = { "volume" : ["foo", "bar", "zaa"]},
> > > > > > >     instances = 3,
> > > > > > >     container = Container(
> > > > > > >       docker = Docker(
> > > > > > >         image = 'image',
> > > > > > >         parameters = params
> > > > > > >       )
> > > > > > >     )
> > > > > > >   )
> > > > > > > ]
> > > > > > >
> > > > > > >
> > > > > > > Something like that, it would create 3 instances of the tasks,
> > each
> > > > one
> > > > > > > running in a container that uses the volumes foo, bar, and zaa.
> > > > > > >
> > > > > > > Does it make sense? I'd be glad to work on it but I want to
> > > validate
> > > > > the
> > > > > > > idea with you first and hear comments about the
> > api/implementation.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > Mauricio
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > John Sirois
> > > 303-512-3301
> > >
> >
>
>
>
> --
> John Sirois
> 303-512-3301
>

Re: Parameterize each Job Instance.

Reply via email to