Infrastructure wise we use docker containers, hosted via Kubernetes on
Google Container Engine  and deployed with Helm. We bake our DAGs and
custom code into the images - so in the end the deployer does a `helm
upgrade` command locally, the images are rebuilt with the newest code, and
then all the containers are recreated with that new image. Our webserver,
worker, flower, and scheduler containers are derived off of
https://github.com/puckel/docker-airflow, and we use rabbitmq official
image off Docker Hub. Our metadata database is in Cloud SQL for our QA and
production clusters on GCE, but for local dev we use the official mysql
image from docker hub. This style of deployment interrupts any running
tasks since the worker container is also killed to be recreated off the new
image.

On Wed, Nov 15, 2017 at 7:42 AM, Zsolt Tóth <[email protected]>
wrote:

> We are also using Ansible for:
> - Installing/upgrading/configuring Airflow (there are several airflow
> roles
> on git)
> - Deploying the pipelines
> - Restarting Airflow webserver/scheduler
>
> It would be great to have Airflow manageable from Hadoop cluster managers
> (Cloudera Manager, Ambari). For this a parcel (for cloudera) should be
> created and installed. If anyone has done this before, please share the
> experience!
>
> Zsolt
>
>
> 2017-11-15 13:30 GMT+01:00 Andrew Maguire <[email protected]>:
>
> > Is there any options at all out there for Airflow as a service type
> > approach?
> >
> > I'd love to just be able to define my dags and load them to some cloud ui
> > and not have to worry about anything else.
> >
> > This looks kinda interesting -
> > http://docs.qubole.com/en/latest/user-guide/airflow/
> > introduction-airflow.html
> >
> > Cheers,
> > Andy
> >
> > On Wed, Nov 15, 2017 at 10:28 AM Driesprong, Fokko <[email protected]
> >
> > wrote:
> >
> > > I'm using Ansible to deploy the Airflow, the steps are:
> > > - First install Airflow using pip (or a rc using curl)
> > > - Do an `airflow version` to trigger the creation of the default config
> > > - Set the config correctly variables in the config using Ansible.
> > > - Deploy the supervisord files
> > > - Start everything
> > >
> > > A separate role is there to deploy Postgres. But if you are working on
> a
> > > cloud environment, you can also get Postgres/MySQL as a service. Hope
> > this
> > > helps.
> > >
> > > Cheers, Fokko
> > >
> > > 2017-11-15 3:19 GMT+01:00 Marc Bollinger <[email protected]>:
> > >
> > > > Samson <https://github.com/zendesk/samson> deploy that runs a script
> > > > running a Broadside <https://github.com/lumoslabs/broadside> deploy
> > for
> > > > ECS, which bounces the Web and Scheduler workers, and updates the DAG
> > > > directory on the workers. Docker images come from a Github -> Travis
> ->
> > > > Quay
> > > > <https://quay.io/> CI setup.
> > > >
> > > > On Tue, Nov 14, 2017 at 10:18 AM, Alek Storm <[email protected]>
> > > wrote:
> > > >
> > > > > Our TeamCity server detects the master branch has changed, then
> > > packages
> > > > up
> > > > > the repo containing our DAGs as an artifact. We then use SaltStack
> to
> > > > > trigger a bash script on the targeted servers that downloads the
> > > > artifact,
> > > > > moves the files to the right place, and restarts the scheduler (on
> > the
> > > > > master).
> > > > >
> > > > > This allows us to easily revert changes by redeploying a particular
> > > > > TeamCity artifact, without touching the git history.
> > > > >
> > > > > Alek
> > > > >
> > > > > On Nov 14, 2017 11:02 AM, "Andy Hadjigeorgiou" <
> [email protected]
> > >
> > > > > wrote:
> > > > >
> > > > > > Hey,
> > > > > >
> > > > > > Was just wondering what tools & services everyone uses to deploy
> > new
> > > > > > versions of their data pipelines (understandably this would vary
> > > > greatly
> > > > > > based on tech stack) but I'd love to hear what the community has
> > been
> > > > > > using.
> > > > > >
> > > > > > - Andy
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to