Hi guys,

Thank you so much for your thoughtful and well articulated replies. This
has been invaluable in charting out next steps for our deployment. Michael,
seems like we are headed towards a similar structure as you have outlined
since our loads are not very heavy as of now. The Kubernetes executor looks
promising and we will be monitoring its status. Daniel, I have already
signed up for the Meetup and hope to see you there as well!

-Shoumitra

On Mon, Nov 6, 2017 at 1:04 PM, Daniel Imberman <[email protected]>
wrote:

> Hi Shoumitra,
>
> One thing worth noting is that with the release of the kubernetes executor,
> we will be using resource versions + the Kubernetes API to take care of
> some of the current issues with crash handling (basically recreating state
> from what tasks have been run/are pending within the cluster). The
> kubernetes executor also offloads all tasks to individual pods so you will
> not need to worry about the resources of any tasks affecting the scheduler.
>
> If you're available (and in SF) on Dec. 4th, we will be discussing the PR
> at airbnb for the airflow meetup.
>
> Hope to see you there!
>
> https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/
> 244525050/
>
> On Mon, Nov 6, 2017 at 9:39 AM Michael Erdely <[email protected]> wrote:
>
> > Hi Shoumitra,
> >
> > As others have mentioned, there are a lot of issues when using the local
> > executor in prod. However, at OfferUp, we have had success in running
> > Airflow dockerized on EC2.
> >
> > Our current setup is the following:
> >
> >    - Airflow 1.8.2 dockerized similar to Matthieu's Celery example at
> >    https://github.com/puckel/docker-airflow
> >    - Running scheduler, webserver, flower, and 5 workers on a c4.8xlarge
> >    EC2 instance
> >    - RDS hosted Postgres
> >    - ElastiCache hosted Redis
> >
> > We are close to the limits of this setup and plan on redoing our
> > configuration with terraform. Not sure if we'll keep the dockerized setup
> > but it's been extremely helpful thus far.
> >
> > -Michael
> >
> >
> >
> > On Thu, Nov 2, 2017 at 11:27 AM Marc Bollinger <[email protected]>
> wrote:
> >
> > > We're actively following the Airflow/Kubernetes integration
> > > <https://issues.apache.org/jira/browse/AIRFLOW-1314>, and are
> eventually
> > > going to move to both running everything on k8s and using
> > > KubernetesExecutors for many things, but we've deployed Airflow to ECS
> > from
> > > day one. It works mostly fine, and we're using a tool we open-sourced
> > > called Broadside <https://github.com/lumoslabs/broadside> to simplify
> > > configuration and deployment. Our deploy is broken up into one
> scheduler,
> > > one Flower instance, a few web servers, and a number of workers, using
> > > CeleryExecutor backed by redis/Elasticache (and RDS postgres, as you're
> > > suggesting), all in ECS from the same private docker image.
> > >
> > > Tacking on to what Bolke is saying, it is also somewhat tricky in our
> > > experience to get deploys right in ECS with CeleryExecutors. Our first
> > > impulse was to bake the DAG directory/repo into the docker image and
> run
> > an
> > > ECS deploy every time we added or updated DAGs, bouncing all of the
> > > components and killing the workers. Where we wound up is that our CI
> > system
> > > still bakes the DAG directory into the images when we merge to master,
> > but
> > > for a "short" deploy we only bounce the web server and scheduler--the
> > > worker containers all just execute `git pull` and pull down the
> > new/updated
> > > DAGs. Others may have different approaches that work, I'm sure,
> possibly
> > > moving the DAG directory to a shared EFS mount.
> > >
> > > On Thu, Nov 2, 2017 at 11:06 AM, Bolke de Bruin <[email protected]>
> > wrote:
> > >
> > > > Please remember that with the LocalExecutor your tasks run in
> > > > process(group) with the scheduler. If you want to restart the
> > scheduler,
> > > it
> > > > will need to wait until all tasks have finished that are currently
> > > running.
> > > > In addition if you tasks are resource intensive (cpu, memory) this
> can
> > > also
> > > > affect the scheduler. In 1.9.0 we are a little bit more robust in
> this
> > > > respect, but guarding against OOM errors is very hard.
> > > >
> > > > Furthermore, the new logging framework in 1.9.0, will allow you to
> have
> > > > logs centrally which might be convenient. However, documentation is
> not
> > > up
> > > > to date so you will have to tune it yourself.
> > > >
> > > > My 2 cents,
> > > >
> > > > Bolke.
> > > >
> > > > > On 2 Nov 2017, at 18:55, Shoumitra Srivastava <
> > [email protected]>
> > > > wrote:
> > > > >
> > > > > Hi guys,
> > > > >
> > > > > So far we have had a lot of success testing out Airflow and we are
> > now
> > > > > going for a full scale deployment. To that end, we are considering
> > > > > dockerizing airflow and deploying it on one of our ECS clusters. We
> > are
> > > > > planning on separating out the web server and the scheduler to
> > separate
> > > > > tasks and using local executor with an RDS postgres and redis
> > backend.
> > > > Does
> > > > > anyone else have any suggestions regarding the setup? Any design
> > > patterns
> > > > > or good practises and gotchas would be welcome.
> > > > >
> > > > > -Shoumitra
> > > >
> > > >
> > >
> >
>

Reply via email to