Hi guys, Thank you so much for your thoughtful and well articulated replies. This has been invaluable in charting out next steps for our deployment. Michael, seems like we are headed towards a similar structure as you have outlined since our loads are not very heavy as of now. The Kubernetes executor looks promising and we will be monitoring its status. Daniel, I have already signed up for the Meetup and hope to see you there as well!
-Shoumitra On Mon, Nov 6, 2017 at 1:04 PM, Daniel Imberman <[email protected]> wrote: > Hi Shoumitra, > > One thing worth noting is that with the release of the kubernetes executor, > we will be using resource versions + the Kubernetes API to take care of > some of the current issues with crash handling (basically recreating state > from what tasks have been run/are pending within the cluster). The > kubernetes executor also offloads all tasks to individual pods so you will > not need to worry about the resources of any tasks affecting the scheduler. > > If you're available (and in SF) on Dec. 4th, we will be discussing the PR > at airbnb for the airflow meetup. > > Hope to see you there! > > https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/ > 244525050/ > > On Mon, Nov 6, 2017 at 9:39 AM Michael Erdely <[email protected]> wrote: > > > Hi Shoumitra, > > > > As others have mentioned, there are a lot of issues when using the local > > executor in prod. However, at OfferUp, we have had success in running > > Airflow dockerized on EC2. > > > > Our current setup is the following: > > > > - Airflow 1.8.2 dockerized similar to Matthieu's Celery example at > > https://github.com/puckel/docker-airflow > > - Running scheduler, webserver, flower, and 5 workers on a c4.8xlarge > > EC2 instance > > - RDS hosted Postgres > > - ElastiCache hosted Redis > > > > We are close to the limits of this setup and plan on redoing our > > configuration with terraform. Not sure if we'll keep the dockerized setup > > but it's been extremely helpful thus far. > > > > -Michael > > > > > > > > On Thu, Nov 2, 2017 at 11:27 AM Marc Bollinger <[email protected]> > wrote: > > > > > We're actively following the Airflow/Kubernetes integration > > > <https://issues.apache.org/jira/browse/AIRFLOW-1314>, and are > eventually > > > going to move to both running everything on k8s and using > > > KubernetesExecutors for many things, but we've deployed Airflow to ECS > > from > > > day one. It works mostly fine, and we're using a tool we open-sourced > > > called Broadside <https://github.com/lumoslabs/broadside> to simplify > > > configuration and deployment. Our deploy is broken up into one > scheduler, > > > one Flower instance, a few web servers, and a number of workers, using > > > CeleryExecutor backed by redis/Elasticache (and RDS postgres, as you're > > > suggesting), all in ECS from the same private docker image. > > > > > > Tacking on to what Bolke is saying, it is also somewhat tricky in our > > > experience to get deploys right in ECS with CeleryExecutors. Our first > > > impulse was to bake the DAG directory/repo into the docker image and > run > > an > > > ECS deploy every time we added or updated DAGs, bouncing all of the > > > components and killing the workers. Where we wound up is that our CI > > system > > > still bakes the DAG directory into the images when we merge to master, > > but > > > for a "short" deploy we only bounce the web server and scheduler--the > > > worker containers all just execute `git pull` and pull down the > > new/updated > > > DAGs. Others may have different approaches that work, I'm sure, > possibly > > > moving the DAG directory to a shared EFS mount. > > > > > > On Thu, Nov 2, 2017 at 11:06 AM, Bolke de Bruin <[email protected]> > > wrote: > > > > > > > Please remember that with the LocalExecutor your tasks run in > > > > process(group) with the scheduler. If you want to restart the > > scheduler, > > > it > > > > will need to wait until all tasks have finished that are currently > > > running. > > > > In addition if you tasks are resource intensive (cpu, memory) this > can > > > also > > > > affect the scheduler. In 1.9.0 we are a little bit more robust in > this > > > > respect, but guarding against OOM errors is very hard. > > > > > > > > Furthermore, the new logging framework in 1.9.0, will allow you to > have > > > > logs centrally which might be convenient. However, documentation is > not > > > up > > > > to date so you will have to tune it yourself. > > > > > > > > My 2 cents, > > > > > > > > Bolke. > > > > > > > > > On 2 Nov 2017, at 18:55, Shoumitra Srivastava < > > [email protected]> > > > > wrote: > > > > > > > > > > Hi guys, > > > > > > > > > > So far we have had a lot of success testing out Airflow and we are > > now > > > > > going for a full scale deployment. To that end, we are considering > > > > > dockerizing airflow and deploying it on one of our ECS clusters. We > > are > > > > > planning on separating out the web server and the scheduler to > > separate > > > > > tasks and using local executor with an RDS postgres and redis > > backend. > > > > Does > > > > > anyone else have any suggestions regarding the setup? Any design > > > patterns > > > > > or good practises and gotchas would be welcome. > > > > > > > > > > -Shoumitra > > > > > > > > > > > > > >
