Re: Airflow kubernetes executor

Gerard Toonstra Thu, 06 Jul 2017 07:57:06 -0700

On Thu, Jul 6, 2017 at 12:35 AM, Daniel Imberman <[email protected]>
wrote:


> Hi  Gerard,
>
> Thank you for your feedback/details of your current set up. I would
> actually really like to jump on a skype/hangout call with you to see where
> we can collaborate on this effort.
>

Send me a PM to work something out. I don't mind weekends, this sa is ok.


>
> > We deploy dags using an Elastic File System (shared across all
> instances), which then map this read-only into the docker
> container.
>
> EFS appears to be one of the options that you can use to create a
> persistent volume on kubernetes. One piece of feedback I had recieved from
> the airflow team is that they wanted as much flexibility as possible for
> users to decide where to store their dags/plugins. They might want
> something similar even if you are running airflow in an ECS environemnt
> https://ngineered.co.uk/blog/using-amazon-efs-to-persist-
> and-share-between-conatiners-data-in-kubernetes
>
>
I think we had to make the volume rw to get this to work, but todays run
was able to process everything.


> In terms of tooling:  The current airflow config is somewhat static inthe
> > sense that it does not reconfigure itself to the (now) dynamic
> environment.
> > You'd think that airflow should have to query the environment to
> figureout
> > parallellism instead of statically specifying this.
>
> I'm not super strongly opinionated on whether the parallelism should be
> handled via the environment or the config file. The config file seems to
> make sense since airflow users already expect to fill out these configs in
> the config file.
>
> That said, it definitely would make sense to allow multiple namespaces to
> use the same airflow.cfg and just define the parallelism in its own
> configuration. Would like further feedback on this.
>

I think initially airflow was provisioned with a static set of workers and
most setups use a specific set of worker machines that are maintained.
What we intend to do is experiment with using file change notifications
inside python processes to detect changes to specifically the airflow.cfg
file.
There are libs in python to detect this. I don't know how this works out on
shared volumes though.

The idea is that when the airflow.cfg changes, the workers and schedulers
reload the new config, so that processes do not need to be restarted.
The scheduler can then reconfigure itself to use higher degrees of
parallellism. We still need to investigate which attributes specifically
should be changeable.

If that works, we can deploy a simple tool to only manipulate the safe
attributes, so that it's possible to reconfigure the entire cluster on the
fly. In the backend,
it would then rewrite the airflow.cfg file. If that works, it means that
others can just log into the scheduler box and adjust settings when
running, so it's a change
that would benefit everyone.


> > About redeploying instances:  We see this as a potential issue for our
> setup.
> > My take is that jobs simply shouldn't take that much time in principle
> to start with,
> > which avoids having to worry about this. If that's ridiculous, shouldn't
> it be a concern
> > of the environment airflow runs in rather than airflowitself?  I.e....
> > further tool out kubernetes CLI's / operators to query the environment
> to plan/deny/schedule
> > this kind of work automatically. Beacuse k8s was probably built from the
> perspective of
> > handling short-running queries, running anything long-term on that is
> going to naturally compete
> > with the architecture.
>
> I would disagree that jobs "shouldn't take that much time." Multiple use
> cases that we are developing our airflow system for can take over a week to
> run. This does raise an interesting question of what to do if airflow dies
> but the tasks are still running.
> One aspect of how we're running this implementation that would help WRT
> restarting  the scheduler is that each pod is its own task with its own
> heartbeat to the SQL source-of-truth.
> This means that even if the scheduler is re-started, as long as we can
> scan for currently running jobs, we can technically continue the DAG
> execution with no interruption. Would want further feedback on whether the
> community wants this ability
>

Hmm... so maybe that was a bit too quick on the trigger. what I mean is
that I wouldn't advise anyone to have airflow workers running for more than
an hour for
reliability concerns. A bigdata job would typically by itself have plenty
of failover capabilities in the cluster to redo parts of the job when
instances die or become
unstable, but here you introduce a single point of failure on a week's
usage of resources.

I'd advise for that reason to redesign the workflow itself to be able to
deal with dying workers, which means kicking off the job, saving the job id
somewhere
and then frequently polling on that job id and exiting early and then build
in the necessary number of retries. Another option is to use LatestOnly and
branching to
make the dag succeed until the job id has a known exit status.



One last thing I looked at today: using splunk to send all logs to a single
place. I like how it's all available, but it's also
all over the place. If you intend to use splunk as well and have dashboards
or examples on how to 'bucket' the data in the
right locations (maybe something where the entire DAG flow becomes
available as a long sequence log), I'd be happy
to hear about such cases and how this was done.  Probably the currently
executing DAG id has to be used somewhere
and the current task id to realign the loglines.

The nice part about splunk is that it can be realtime and you can see
things in larger contexts, like "cluster-wide".



>
> On Wed, Jul 5, 2017 at 1:26 PM Daniel Imberman <[email protected]>
> wrote:
>
>> Thanks Chris, will do!
>>
>> On Wed, Jul 5, 2017 at 1:26 PM Chris Riccomini <[email protected]>
>> wrote:
>>
>>> @Daniel, done! Should have access. Please create the wiki as a subpage
>>> under:
>>>
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/Roadmap
>>>
>>> On Wed, Jul 5, 2017 at 1:20 PM, Daniel Imberman <
>>> [email protected]>
>>> wrote:
>>>
>>> > @chris: Thank you! My wiki name is dimberman.
>>> > @gerard: I've started writing out my reply but there's a fair amount to
>>> > respond to so I'll need a few minutes :).
>>> >
>>> > On Wed, Jul 5, 2017 at 1:17 PM Chris Riccomini <[email protected]>
>>> > wrote:
>>> >
>>> > > @daniel, what's your wiki username? I can grant you access.
>>> > >
>>> > > On Wed, Jul 5, 2017 at 12:35 PM, Gerard Toonstra <
>>> [email protected]>
>>> > > wrote:
>>> > >
>>> > > > Hey Daniel,
>>> > > >
>>> > > > Great work. We're looking at running airflow on AWS ECS inside
>>> docker
>>> > > > containers and making great progress on this.
>>> > > > We use redis and RDS as managed services to form a comms backbone
>>> and
>>> > > then
>>> > > > just spawn webserver, scheduler, worker and flower containers
>>> > > > as needed on ECS. We deploy dags using an Elastic File System
>>> (shared
>>> > > > across all instances), which then map this read-only into the
>>> docker
>>> > > > container.
>>> > > > We're now evaluating this setup going forward in more earnest.
>>> > > >
>>> > > > Good idea to use queues to separate dependencies or some other
>>> concerns
>>> > > > (high-mem pods?), there are many ways this way that it's possible
>>> to
>>> > > > customize where and on which hw a DAG is going to run. We're
>>> looking at
>>> > > > Cycle scaling to temporarily increase resources in a morning run
>>> and
>>> > > create
>>> > > > larger worker containers for data science tasks and perhaps some
>>> other
>>> > > > tasks.
>>> > > >
>>> > > >
>>> > > > - In terms of tooling:  The current airflow config is somewhat
>>> static
>>> > in
>>> > > > the sense that it does not reconfigure itself to the (now) dynamic
>>> > > > environment.
>>> > > >   You'd think that airflow should have to query the environment to
>>> > figure
>>> > > > out parallellism instead of statically specifying this.
>>> > > >
>>> > > > - Sometimes DAGs import hooks or operators that import
>>> dependencies at
>>> > > the
>>> > > > top. The only reason, (I think) that a scheduler needs to
>>> physically
>>> > > >   import and parse a DAG is because there may be dynamically built
>>> > > elements
>>> > > > within the DAG. If there wouldn't be static elements, it is
>>> > theoretically
>>> > > >    possible to optimize this.  Your PDF sort of hints towards a
>>> system
>>> > > > where a worker where a DAG will eventually run could parse the DAG
>>> and
>>> > > > report
>>> > > >    back a meta description of the DAG, which could simplify and
>>> > optimize
>>> > > > performance of the scheduler at the cost of network roundtrips.
>>> > > >
>>> > > > - About redeploying instances:  We see this as a potential issue
>>> for
>>> > our
>>> > > > setup. My take is that jobs simply shouldn't take that much time in
>>> > > > principle to start with,
>>> > > >    which avoids having to worry about this. If that's ridiculous,
>>> > > shouldn't
>>> > > > it be a concern of the environment airflow runs in rather than
>>> airflow
>>> > > > itself?  I.e....
>>> > > >    further tool out kubernetes CLI's / operators to query the
>>> > environment
>>> > > > to plan/deny/schedule this kind of work automatically. Beacuse k8s
>>> was
>>> > > > probably
>>> > > >     built from the perspective of handling short-running queries,
>>> > running
>>> > > > anything long-term on that is going to naturally compete with the
>>> > > > architecture.
>>> > > >
>>> > > > - About failures and instances disappearing on failure: it's not
>>> > > desirable
>>> > > > to keep those instances around for a long time, we really do need
>>> to
>>> > > depend
>>> > > > on
>>> > > >    client logging and other services available to tell us what
>>> > happened.
>>> > > > The difference in thinking is that a pod/container is just a
>>> temporary
>>> > > > thing that runs a job
>>> > > >    and we should be interested in how the job did vs. how the
>>> > > container/pod
>>> > > > ran this. From my little experience with k8s though, I do see that
>>> it
>>> > > tends
>>> > > > to
>>> > > >    get rid of everything a little bit too quick on failure. One
>>> thing
>>> > you
>>> > > > could look into is to log onto a commonly shared volume with a
>>> specific
>>> > > > 'key' for that container,
>>> > > >    so you can always refer back to the important log file and fish
>>> this
>>> > > > out, with measures to clean up the shared filesystem on a regular
>>> > basis.
>>> > > >
>>> > > > - About rescaling and starting jobs:  it doesn't come for free as
>>> you
>>> > > > mention. I think it's a great idea to be able to scale out on busy
>>> > > > intervals (we intend to just use cycle scaling here),
>>> > > >   but a hint towards what policy or scaling strategy you intend to
>>> use
>>> > on
>>> > > > k8s is welcome there.
>>> > > >
>>> > > >
>>> > > > Gerard
>>> > > >
>>> > > >
>>> > > > On Wed, Jul 5, 2017 at 8:43 PM, Daniel Imberman <
>>> > > [email protected]
>>> > > > >
>>> > > > wrote:
>>> > > >
>>> > > > > @amit
>>> > > > >
>>> > > > > I've added the proposal to the PR for now. Should make it easier
>>> for
>>> > > > people
>>> > > > > to get to it. Will delete once I add it to the wiki.
>>> > > > >
>>> > > https://github.com/bloomberg/airflow/blob/
>>> 29694ae9903c4dad3f18fb8eb767c4
>>> > > > > 922dbef2e8/dimberman-KubernetesExecutorProposal-
>>> 050717-1423-36.pdf
>>> > > > >
>>> > > > > Daniel
>>> > > > >
>>> > > > > On Wed, Jul 5, 2017 at 11:36 AM Daniel Imberman <
>>> > > > [email protected]
>>> > > > > >
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Hi Amit,
>>> > > > > >
>>> > > > > > For now the design doc is included as an attachment to the
>>> original
>>> > > > > email.
>>> > > > > > Once I am able to get permission to edit the wiki I would like
>>> add
>>> > it
>>> > > > > there
>>> > > > > > but for now I figured that this would get the ball rolling.
>>> > > > > >
>>> > > > > >
>>> > > > > > Daniel
>>> > > > > >
>>> > > > > >
>>> > > > > > On Wed, Jul 5, 2017 at 11:33 AM Amit Kulkarni <[email protected]
>>> >
>>> > > wrote:
>>> > > > > >
>>> > > > > >> Hi Daniel,
>>> > > > > >>
>>> > > > > >> I don't see link to design PDF.
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> Amit Kulkarni
>>> > > > > >> Site Reliability Engineer
>>> > > > > >> Mobile:  (716)-352-3270 <(716)%20352-3270>
>>> <(716)%20352-3270> <(716)%20352-3270>
>>> > > > > >>
>>> > > > > >> Payments partner to the platform economy
>>> > > > > >>
>>> > > > > >> On Wed, Jul 5, 2017 at 11:25 AM, Daniel Imberman <
>>> > > > > >> [email protected]>
>>> > > > > >> wrote:
>>> > > > > >>
>>> > > > > >> > Hello Airflow community!
>>> > > > > >> >
>>> > > > > >> > My name is Daniel Imberman, and I have been working on
>>> behalf of
>>> > > > > >> Bloomberg
>>> > > > > >> > LP to create an airflow kubernetes executor/operator. We
>>> wanted
>>> > to
>>> > > > > allow
>>> > > > > >> > for maximum throughput/scalability, while keeping a lot of
>>> the
>>> > > > > >> kubernetes
>>> > > > > >> > details abstracted away from the users. Below I have a link
>>> to
>>> > the
>>> > > > WIP
>>> > > > > >> PR
>>> > > > > >> > and the PDF of the initial proposal. If anyone has any
>>> > > > > >> comments/questions I
>>> > > > > >> > would be glad to discuss this feature further.
>>> > > > > >> >
>>> > > > > >> > Thank you,
>>> > > > > >> >
>>> > > > > >> > Daniel
>>> > > > > >> >
>>> > > > > >> > https://github.com/apache/incubator-airflow/pull/2414
>>> > > > > >> >
>>> > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Re: Airflow kubernetes executor

Reply via email to