Re: Airflow kubernetes executor

Daniel Imberman Wed, 05 Jul 2017 13:27:49 -0700

Thanks Chris, will do!

On Wed, Jul 5, 2017 at 1:26 PM Chris Riccomini <criccom...@apache.org>
wrote:


> @Daniel, done! Should have access. Please create the wiki as a subpage
> under:
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/Roadmap
>
> On Wed, Jul 5, 2017 at 1:20 PM, Daniel Imberman <daniel.imber...@gmail.com
> >
> wrote:
>
> > @chris: Thank you! My wiki name is dimberman.
> > @gerard: I've started writing out my reply but there's a fair amount to
> > respond to so I'll need a few minutes :).
> >
> > On Wed, Jul 5, 2017 at 1:17 PM Chris Riccomini <criccom...@apache.org>
> > wrote:
> >
> > > @daniel, what's your wiki username? I can grant you access.
> > >
> > > On Wed, Jul 5, 2017 at 12:35 PM, Gerard Toonstra <gtoons...@gmail.com>
> > > wrote:
> > >
> > > > Hey Daniel,
> > > >
> > > > Great work. We're looking at running airflow on AWS ECS inside docker
> > > > containers and making great progress on this.
> > > > We use redis and RDS as managed services to form a comms backbone and
> > > then
> > > > just spawn webserver, scheduler, worker and flower containers
> > > > as needed on ECS. We deploy dags using an Elastic File System (shared
> > > > across all instances), which then map this read-only into the docker
> > > > container.
> > > > We're now evaluating this setup going forward in more earnest.
> > > >
> > > > Good idea to use queues to separate dependencies or some other
> concerns
> > > > (high-mem pods?), there are many ways this way that it's possible to
> > > > customize where and on which hw a DAG is going to run. We're looking
> at
> > > > Cycle scaling to temporarily increase resources in a morning run and
> > > create
> > > > larger worker containers for data science tasks and perhaps some
> other
> > > > tasks.
> > > >
> > > >
> > > > - In terms of tooling:  The current airflow config is somewhat static
> > in
> > > > the sense that it does not reconfigure itself to the (now) dynamic
> > > > environment.
> > > >   You'd think that airflow should have to query the environment to
> > figure
> > > > out parallellism instead of statically specifying this.
> > > >
> > > > - Sometimes DAGs import hooks or operators that import dependencies
> at
> > > the
> > > > top. The only reason, (I think) that a scheduler needs to physically
> > > >   import and parse a DAG is because there may be dynamically built
> > > elements
> > > > within the DAG. If there wouldn't be static elements, it is
> > theoretically
> > > >    possible to optimize this.  Your PDF sort of hints towards a
> system
> > > > where a worker where a DAG will eventually run could parse the DAG
> and
> > > > report
> > > >    back a meta description of the DAG, which could simplify and
> > optimize
> > > > performance of the scheduler at the cost of network roundtrips.
> > > >
> > > > - About redeploying instances:  We see this as a potential issue for
> > our
> > > > setup. My take is that jobs simply shouldn't take that much time in
> > > > principle to start with,
> > > >    which avoids having to worry about this. If that's ridiculous,
> > > shouldn't
> > > > it be a concern of the environment airflow runs in rather than
> airflow
> > > > itself?  I.e....
> > > >    further tool out kubernetes CLI's / operators to query the
> > environment
> > > > to plan/deny/schedule this kind of work automatically. Beacuse k8s
> was
> > > > probably
> > > >     built from the perspective of handling short-running queries,
> > running
> > > > anything long-term on that is going to naturally compete with the
> > > > architecture.
> > > >
> > > > - About failures and instances disappearing on failure: it's not
> > > desirable
> > > > to keep those instances around for a long time, we really do need to
> > > depend
> > > > on
> > > >    client logging and other services available to tell us what
> > happened.
> > > > The difference in thinking is that a pod/container is just a
> temporary
> > > > thing that runs a job
> > > >    and we should be interested in how the job did vs. how the
> > > container/pod
> > > > ran this. From my little experience with k8s though, I do see that it
> > > tends
> > > > to
> > > >    get rid of everything a little bit too quick on failure. One thing
> > you
> > > > could look into is to log onto a commonly shared volume with a
> specific
> > > > 'key' for that container,
> > > >    so you can always refer back to the important log file and fish
> this
> > > > out, with measures to clean up the shared filesystem on a regular
> > basis.
> > > >
> > > > - About rescaling and starting jobs:  it doesn't come for free as you
> > > > mention. I think it's a great idea to be able to scale out on busy
> > > > intervals (we intend to just use cycle scaling here),
> > > >   but a hint towards what policy or scaling strategy you intend to
> use
> > on
> > > > k8s is welcome there.
> > > >
> > > >
> > > > Gerard
> > > >
> > > >
> > > > On Wed, Jul 5, 2017 at 8:43 PM, Daniel Imberman <
> > > daniel.imber...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > @amit
> > > > >
> > > > > I've added the proposal to the PR for now. Should make it easier
> for
> > > > people
> > > > > to get to it. Will delete once I add it to the wiki.
> > > > >
> > >
> https://github.com/bloomberg/airflow/blob/29694ae9903c4dad3f18fb8eb767c4
> > > > > 922dbef2e8/dimberman-KubernetesExecutorProposal-050717-1423-36.pdf
> > > > >
> > > > > Daniel
> > > > >
> > > > > On Wed, Jul 5, 2017 at 11:36 AM Daniel Imberman <
> > > > daniel.imber...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Amit,
> > > > > >
> > > > > > For now the design doc is included as an attachment to the
> original
> > > > > email.
> > > > > > Once I am able to get permission to edit the wiki I would like
> add
> > it
> > > > > there
> > > > > > but for now I figured that this would get the ball rolling.
> > > > > >
> > > > > >
> > > > > > Daniel
> > > > > >
> > > > > >
> > > > > > On Wed, Jul 5, 2017 at 11:33 AM Amit Kulkarni <am...@wepay.com>
> > > wrote:
> > > > > >
> > > > > >> Hi Daniel,
> > > > > >>
> > > > > >> I don't see link to design PDF.
> > > > > >>
> > > > > >>
> > > > > >> Amit Kulkarni
> > > > > >> Site Reliability Engineer
> > > > > >> Mobile:  (716)-352-3270 <(716)%20352-3270> <(716)%20352-3270>
> <(716)%20352-3270>
> > > > > >>
> > > > > >> Payments partner to the platform economy
> > > > > >>
> > > > > >> On Wed, Jul 5, 2017 at 11:25 AM, Daniel Imberman <
> > > > > >> daniel.imber...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hello Airflow community!
> > > > > >> >
> > > > > >> > My name is Daniel Imberman, and I have been working on behalf
> of
> > > > > >> Bloomberg
> > > > > >> > LP to create an airflow kubernetes executor/operator. We
> wanted
> > to
> > > > > allow
> > > > > >> > for maximum throughput/scalability, while keeping a lot of the
> > > > > >> kubernetes
> > > > > >> > details abstracted away from the users. Below I have a link to
> > the
> > > > WIP
> > > > > >> PR
> > > > > >> > and the PDF of the initial proposal. If anyone has any
> > > > > >> comments/questions I
> > > > > >> > would be glad to discuss this feature further.
> > > > > >> >
> > > > > >> > Thank you,
> > > > > >> >
> > > > > >> > Daniel
> > > > > >> >
> > > > > >> > https://github.com/apache/incubator-airflow/pull/2414
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Airflow kubernetes executor

Reply via email to