Yep, the cleanup_pods script is set up now as an optional Kubernetes CronJob ( https://github.com/astronomer/airflow-chart/blob/master/templates/cleanup/cleanup-cronjob.yaml) that we have run periodically to clean failed pods up and could stay separate.
The wait_for_migrations script could definitely be pulled into Airflow. For context, we deploy an initContainer on the scheduler ( https://github.com/astronomer/airflow-chart/blob/master/templates/scheduler/scheduler-deployment.yaml#L77-L84) that runs the upgradedb command before booting the scheduler. This new wait_for_migration script runs in an initContainer on the webserver and workers ( https://github.com/astronomer/airflow-chart/blob/master/templates/webserver/webserver-deployment.yaml#L58-L65) so that they don't boot up ahead of a potentially long-running migration and attempt to operate on new or missing columns/tables before the migrations run. This prevents these pods from entering a CrashLoop. On Tue, Mar 24, 2020 at 11:48 AM Jarek Potiuk <jarek.pot...@polidea.com> wrote: > > > > @Tomasz great question. Our images are currently generated from > Dockerfiles > > in this repo https://github.com/astronomer/ap-airflow and get published > to > > DockerHub > > https://hub.docker.com/repository/docker/astronomerinc/ap-airflow. > > > > For the most part those are typical Airflow images. There's an entrypoint > > script that we include in the image that handles waiting for the database > > and redis (if used) to come up, which is pretty generic. > > > I already added waiting for the database (both metadata and celery URL) in > the PR: > > https://github.com/apache/airflow/pull/7832/files#diff-3759f40d4e8ba0c0e82e82b66d376741 > . > It's functionally the same but more generic. > > The only other > > thing that I think the Helm Chart uses would be the scripts in this repo > > https://github.com/astronomer/astronomer-airflow-scripts. Our > Dockerfiles > > pull this package in. These scripts are used to coordinate running > > migrations and cleaning up failed pods. > > > > I see two scripts: > > * cleanup_pods -> this is (I believe) not needed to run in airflow - this > could be run as a separate pod/container? > * waiting for migrations -> I think this is a good candidate to add > *airflow > db wait_for_migration* command and make it part of airflow itself. > > I think we also have to agree on the Airflow version supported by the > official helm chart. I'd suggest we support 1.10.10+ and we incorporate all > the changes needed to airflow (like the "db wait_for_migration") into 2.0 > and 1.10 and we support both - image and helm chart for those versions > only. That would help with people migrating to the latest version. > > WDYT? > > > > On Tue, Mar 24, 2020 at 10:49 AM Daniel Imberman < > > daniel.imber...@gmail.com> > > wrote: > > > > > @jarek I agree completely. I think that pairing an official helm chart > > > with the official image would make for a REALLY powerful “up and > running > > > with airflow” story :). Tomek and I have also been looking into > > > operator-sdk which has the ability to create custom controllers from > helm > > > charts. We might even able to get a 1-2 punch from the same code base > :). > > > > > > @kaxil @jarek @aizhamal @ash if there’s no issues, can we please start > > the > > > process of donation? > > > > > > +1 on my part, of course :) > > > > > > > > > > > > Daniel > > > On Mar 24, 2020, 7:40 AM -0700, Jarek Potiuk <jarek.pot...@polidea.com > >, > > > wrote: > > > > +1. And it should be paired with the official image we have work in > > > > progress on. I looked a lot at the Astronomer's image while preparing > > my > > > > draft and we can make any adjustments needed to make it works with > the > > > helm > > > > chart - and I am super happy to collaborate on that. > > > > > > > > PR here: https://github.com/apache/airflow/pull/7832 > > > > > > > > J. > > > > > > > > > > > > On Tue, Mar 24, 2020 at 3:15 PM Kaxil Naik <kaxiln...@gmail.com> > > wrote: > > > > > > > > > @Tomasz Urbaszek <tomasz.urbas...@polidea.com> : > > > > > Helm Chart Link: https://github.com/astronomer/airflow-chart > > > > > > > > > > On Tue, Mar 24, 2020 at 2:13 PM Tomasz Urbaszek < > > turbas...@apache.org> > > > > > wrote: > > > > > > > > > > > An official helm chart is something our community needs! Using > your > > > > > > chart as the official makes a lot of sens to me because as you > > > > > > mentioned - it's battle tested. > > > > > > > > > > > > One question: what Airflow image do you use? Also, would you mind > > > > > > sharing a link to the chart? > > > > > > > > > > > > Tomek > > > > > > > > > > > > > > > > > > On Tue, Mar 24, 2020 at 2:07 PM Greg Neiheisel > > > > > > <g...@astronomer.io.invalid> wrote: > > > > > > > > > > > > > > Hey everyone, > > > > > > > > > > > > > > Over the past few years at Astronomer, we’ve created, managed, > > and > > > > > > hardened > > > > > > > a production-ready Helm Chart for Airflow ( > > > > > > > https://github.com/astronomer/airflow-chart) that is being > used > > by > > > > > both > > > > > > our > > > > > > > SaaS and Enterprise customers. This chart is battle-tested and > > > running > > > > > > > hundreds of Airflow deployments of varying sizes and runtime > > > > > > environments. > > > > > > > It’s been built up to encapsulate the issues that Airflow users > > run > > > > > into > > > > > > in > > > > > > > the real world. > > > > > > > > > > > > > > While this chart was originally developed internally for our > > > Astronomer > > > > > > > Platform, we’ve recently decoupled the chart from the rest of > our > > > > > > platform > > > > > > > to make it usable by the greater Airflow community. With these > > > changes > > > > > in > > > > > > > mind, we want to start a conversation about donating this chart > > to > > > the > > > > > > > Airflow community. > > > > > > > > > > > > > > Some of the main features of the chart are: > > > > > > > > > > > > > > - It works out of the box. With zero configuration, a user will > > get > > > > > a > > > > > > > postgres database, a default user and the KubernetesExecutor > > ready > > > > > to > > > > > > run > > > > > > > DAGs. > > > > > > > - Support for Local, Celery (w/ optional KEDA autoscaling) and > > > > > > > Kubernetes executors. > > > > > > > > > > > > > > Support for optional pgbouncer. We use this to share a > > configurable > > > > > > > connection pool size per deployment. Useful for limiting > > > connections to > > > > > > the > > > > > > > metadata database. > > > > > > > > > > > > > > - Airflow migration support. A user can push a newer version of > > > > > > Airflow > > > > > > > into an existing release and migrations will automatically run > > > > > > cleanly. > > > > > > > - Prometheus support. Optionally install and configure a > > > > > > statsd-exporter > > > > > > > to ingest Airflow metrics and expose them to Prometheus > > > > > automatically. > > > > > > > - Resource control. Optionally control the ResourceQuotas and > > > > > > > LimitRanges for each deployment so that no deployment can > > overload > > > a > > > > > > > cluster. > > > > > > > - Simple optional Elasticsearch support. > > > > > > > - Optional namespace cleanup. Sometimes KubernetesExecutor and > > > > > > > KubernetesPodOperator pods fail for reasons other than the > actual > > > > > > task. > > > > > > > This feature helps keep things clean in Kubernetes. > > > > > > > - Support for running locally in KIND (Kubernetes in Docker). > > > > > > > - Automatically tested across many Kubernetes versions with > Helm > > 2 > > > > > > and 3 > > > > > > > support. > > > > > > > > > > > > > > We’ve found that the cleanest and most reliable way to deploy > > DAGs > > > to > > > > > > > Kubernetes and manage them at scale is to package them into the > > > actual > > > > > > > docker image, so we have geared this chart towards that method > of > > > > > > > operation, though adding other methods should be > straightforward. > > > > > > > > > > > > > > We would love thoughts from the community and would love to see > > > this > > > > > > chart > > > > > > > help others to get up and running on Kubernetes! > > > > > > > > > > > > > > -- > > > > > > > *Greg Neiheisel* / Chief Architect Astronomer.io > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Jarek Potiuk > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > [image: Polidea] <https://www.polidea.com/> > > > > > > > > > -- > > *Greg Neiheisel* / Chief Architect Astronomer.io > > > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > -- *Greg Neiheisel* / Chief Architect Astronomer.io