Re: Airflow kubernetes executor

Chris Riccomini Wed, 05 Jul 2017 13:18:03 -0700

@daniel, what's your wiki username? I can grant you access.

On Wed, Jul 5, 2017 at 12:35 PM, Gerard Toonstra <[email protected]>
wrote:


> Hey Daniel,
>
> Great work. We're looking at running airflow on AWS ECS inside docker
> containers and making great progress on this.
> We use redis and RDS as managed services to form a comms backbone and then
> just spawn webserver, scheduler, worker and flower containers
> as needed on ECS. We deploy dags using an Elastic File System (shared
> across all instances), which then map this read-only into the docker
> container.
> We're now evaluating this setup going forward in more earnest.
>
> Good idea to use queues to separate dependencies or some other concerns
> (high-mem pods?), there are many ways this way that it's possible to
> customize where and on which hw a DAG is going to run. We're looking at
> Cycle scaling to temporarily increase resources in a morning run and create
> larger worker containers for data science tasks and perhaps some other
> tasks.
>
>
> - In terms of tooling:  The current airflow config is somewhat static in
> the sense that it does not reconfigure itself to the (now) dynamic
> environment.
>   You'd think that airflow should have to query the environment to figure
> out parallellism instead of statically specifying this.
>
> - Sometimes DAGs import hooks or operators that import dependencies at the
> top. The only reason, (I think) that a scheduler needs to physically
>   import and parse a DAG is because there may be dynamically built elements
> within the DAG. If there wouldn't be static elements, it is theoretically
>    possible to optimize this.  Your PDF sort of hints towards a system
> where a worker where a DAG will eventually run could parse the DAG and
> report
>    back a meta description of the DAG, which could simplify and optimize
> performance of the scheduler at the cost of network roundtrips.
>
> - About redeploying instances:  We see this as a potential issue for our
> setup. My take is that jobs simply shouldn't take that much time in
> principle to start with,
>    which avoids having to worry about this. If that's ridiculous, shouldn't
> it be a concern of the environment airflow runs in rather than airflow
> itself?  I.e....
>    further tool out kubernetes CLI's / operators to query the environment
> to plan/deny/schedule this kind of work automatically. Beacuse k8s was
> probably
>     built from the perspective of handling short-running queries, running
> anything long-term on that is going to naturally compete with the
> architecture.
>
> - About failures and instances disappearing on failure: it's not desirable
> to keep those instances around for a long time, we really do need to depend
> on
>    client logging and other services available to tell us what happened.
> The difference in thinking is that a pod/container is just a temporary
> thing that runs a job
>    and we should be interested in how the job did vs. how the container/pod
> ran this. From my little experience with k8s though, I do see that it tends
> to
>    get rid of everything a little bit too quick on failure. One thing you
> could look into is to log onto a commonly shared volume with a specific
> 'key' for that container,
>    so you can always refer back to the important log file and fish this
> out, with measures to clean up the shared filesystem on a regular basis.
>
> - About rescaling and starting jobs:  it doesn't come for free as you
> mention. I think it's a great idea to be able to scale out on busy
> intervals (we intend to just use cycle scaling here),
>   but a hint towards what policy or scaling strategy you intend to use on
> k8s is welcome there.
>
>
> Gerard
>
>
> On Wed, Jul 5, 2017 at 8:43 PM, Daniel Imberman <[email protected]
> >
> wrote:
>
> > @amit
> >
> > I've added the proposal to the PR for now. Should make it easier for
> people
> > to get to it. Will delete once I add it to the wiki.
> > https://github.com/bloomberg/airflow/blob/29694ae9903c4dad3f18fb8eb767c4
> > 922dbef2e8/dimberman-KubernetesExecutorProposal-050717-1423-36.pdf
> >
> > Daniel
> >
> > On Wed, Jul 5, 2017 at 11:36 AM Daniel Imberman <
> [email protected]
> > >
> > wrote:
> >
> > > Hi Amit,
> > >
> > > For now the design doc is included as an attachment to the original
> > email.
> > > Once I am able to get permission to edit the wiki I would like add it
> > there
> > > but for now I figured that this would get the ball rolling.
> > >
> > >
> > > Daniel
> > >
> > >
> > > On Wed, Jul 5, 2017 at 11:33 AM Amit Kulkarni <[email protected]> wrote:
> > >
> > >> Hi Daniel,
> > >>
> > >> I don't see link to design PDF.
> > >>
> > >>
> > >> Amit Kulkarni
> > >> Site Reliability Engineer
> > >> Mobile:  (716)-352-3270 <(716)%20352-3270>
> > >>
> > >> Payments partner to the platform economy
> > >>
> > >> On Wed, Jul 5, 2017 at 11:25 AM, Daniel Imberman <
> > >> [email protected]>
> > >> wrote:
> > >>
> > >> > Hello Airflow community!
> > >> >
> > >> > My name is Daniel Imberman, and I have been working on behalf of
> > >> Bloomberg
> > >> > LP to create an airflow kubernetes executor/operator. We wanted to
> > allow
> > >> > for maximum throughput/scalability, while keeping a lot of the
> > >> kubernetes
> > >> > details abstracted away from the users. Below I have a link to the
> WIP
> > >> PR
> > >> > and the PDF of the initial proposal. If anyone has any
> > >> comments/questions I
> > >> > would be glad to discuss this feature further.
> > >> >
> > >> > Thank you,
> > >> >
> > >> > Daniel
> > >> >
> > >> > https://github.com/apache/incubator-airflow/pull/2414
> > >> >
> > >>
> > >
> >
>

Re: Airflow kubernetes executor

Reply via email to