Re: Airflow on ECS

Shoumitra Srivastava Mon, 06 Nov 2017 18:52:22 -0800

Hi guys,

Thank you so much for your thoughtful and well articulated replies. This
has been invaluable in charting out next steps for our deployment. Michael,
seems like we are headed towards a similar structure as you have outlined
since our loads are not very heavy as of now. The Kubernetes executor looks
promising and we will be monitoring its status. Daniel, I have already
signed up for the Meetup and hope to see you there as well!


-Shoumitra

On Mon, Nov 6, 2017 at 1:04 PM, Daniel Imberman <[email protected]>
wrote:

> Hi Shoumitra,
>
> One thing worth noting is that with the release of the kubernetes executor,
> we will be using resource versions + the Kubernetes API to take care of
> some of the current issues with crash handling (basically recreating state
> from what tasks have been run/are pending within the cluster). The
> kubernetes executor also offloads all tasks to individual pods so you will
> not need to worry about the resources of any tasks affecting the scheduler.
>
> If you're available (and in SF) on Dec. 4th, we will be discussing the PR
> at airbnb for the airflow meetup.
>
> Hope to see you there!
>
> https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/
> 244525050/
>
> On Mon, Nov 6, 2017 at 9:39 AM Michael Erdely <[email protected]> wrote:
>
> > Hi Shoumitra,
> >
> > As others have mentioned, there are a lot of issues when using the local
> > executor in prod. However, at OfferUp, we have had success in running
> > Airflow dockerized on EC2.
> >
> > Our current setup is the following:
> >
> >    - Airflow 1.8.2 dockerized similar to Matthieu's Celery example at
> >    https://github.com/puckel/docker-airflow
> >    - Running scheduler, webserver, flower, and 5 workers on a c4.8xlarge
> >    EC2 instance
> >    - RDS hosted Postgres
> >    - ElastiCache hosted Redis
> >
> > We are close to the limits of this setup and plan on redoing our
> > configuration with terraform. Not sure if we'll keep the dockerized setup
> > but it's been extremely helpful thus far.
> >
> > -Michael
> >
> >
> >
> > On Thu, Nov 2, 2017 at 11:27 AM Marc Bollinger <[email protected]>
> wrote:
> >
> > > We're actively following the Airflow/Kubernetes integration
> > > <https://issues.apache.org/jira/browse/AIRFLOW-1314>, and are
> eventually
> > > going to move to both running everything on k8s and using
> > > KubernetesExecutors for many things, but we've deployed Airflow to ECS
> > from
> > > day one. It works mostly fine, and we're using a tool we open-sourced
> > > called Broadside <https://github.com/lumoslabs/broadside> to simplify
> > > configuration and deployment. Our deploy is broken up into one
> scheduler,
> > > one Flower instance, a few web servers, and a number of workers, using
> > > CeleryExecutor backed by redis/Elasticache (and RDS postgres, as you're
> > > suggesting), all in ECS from the same private docker image.
> > >
> > > Tacking on to what Bolke is saying, it is also somewhat tricky in our
> > > experience to get deploys right in ECS with CeleryExecutors. Our first
> > > impulse was to bake the DAG directory/repo into the docker image and
> run
> > an
> > > ECS deploy every time we added or updated DAGs, bouncing all of the
> > > components and killing the workers. Where we wound up is that our CI
> > system
> > > still bakes the DAG directory into the images when we merge to master,
> > but
> > > for a "short" deploy we only bounce the web server and scheduler--the
> > > worker containers all just execute `git pull` and pull down the
> > new/updated
> > > DAGs. Others may have different approaches that work, I'm sure,
> possibly
> > > moving the DAG directory to a shared EFS mount.
> > >
> > > On Thu, Nov 2, 2017 at 11:06 AM, Bolke de Bruin <[email protected]>
> > wrote:
> > >
> > > > Please remember that with the LocalExecutor your tasks run in
> > > > process(group) with the scheduler. If you want to restart the
> > scheduler,
> > > it
> > > > will need to wait until all tasks have finished that are currently
> > > running.
> > > > In addition if you tasks are resource intensive (cpu, memory) this
> can
> > > also
> > > > affect the scheduler. In 1.9.0 we are a little bit more robust in
> this
> > > > respect, but guarding against OOM errors is very hard.
> > > >
> > > > Furthermore, the new logging framework in 1.9.0, will allow you to
> have
> > > > logs centrally which might be convenient. However, documentation is
> not
> > > up
> > > > to date so you will have to tune it yourself.
> > > >
> > > > My 2 cents,
> > > >
> > > > Bolke.
> > > >
> > > > > On 2 Nov 2017, at 18:55, Shoumitra Srivastava <
> > [email protected]>
> > > > wrote:
> > > > >
> > > > > Hi guys,
> > > > >
> > > > > So far we have had a lot of success testing out Airflow and we are
> > now
> > > > > going for a full scale deployment. To that end, we are considering
> > > > > dockerizing airflow and deploying it on one of our ECS clusters. We
> > are
> > > > > planning on separating out the web server and the scheduler to
> > separate
> > > > > tasks and using local executor with an RDS postgres and redis
> > backend.
> > > > Does
> > > > > anyone else have any suggestions regarding the setup? Any design
> > > patterns
> > > > > or good practises and gotchas would be welcome.
> > > > >
> > > > > -Shoumitra
> > > >
> > > >
> > >
> >
>

Re: Airflow on ECS

Reply via email to