Our Airflow situation:
• Development happens in two different repos (a repo that holds a lot of
cross-company python tools, for the core app and any plugins we develop on
top of it, and our reporting tools/infra repo, for DAGs and related utility
files).
• The core app & plugins get packaged together with their imports into a
.pex (using the 'pants' build tool)  with some related code and glue, and
can be manually deployed to the relevant box in staging and/or prod any
time with puppet (or automatically during the weekly deploy) once they're
in master.
• Updates to DAGs and utility files only get automatically deployed to prod
from master during the weekly deploy, but we're a little loosey-goosey
about editing DAGs in prod when necessary, since Airflow primarily handles
internal stuff and we're still in the early stages of switching our legacy
junk over to it.
• Right now, everything Airflow in prod runs on just one instance (AWS
EC2), with an AWS RDS MySQL backing it.  We haven't yet figured out what
our best options for scaling are going to be (opinions welcome, please).

On Wed, Nov 15, 2017 at 9:26 AM, Laura Lorenz <[email protected]>
wrote:

> Infrastructure wise we use docker containers, hosted via Kubernetes on
> Google Container Engine  and deployed with Helm. We bake our DAGs and
> custom code into the images - so in the end the deployer does a `helm
> upgrade` command locally, the images are rebuilt with the newest code, and
> then all the containers are recreated with that new image. Our webserver,
> worker, flower, and scheduler containers are derived off of
> https://github.com/puckel/docker-airflow, and we use rabbitmq official
> image off Docker Hub. Our metadata database is in Cloud SQL for our QA and
> production clusters on GCE, but for local dev we use the official mysql
> image from docker hub. This style of deployment interrupts any running
> tasks since the worker container is also killed to be recreated off the new
> image.
>
> On Wed, Nov 15, 2017 at 7:42 AM, Zsolt Tóth <[email protected]>
> wrote:
>
> > We are also using Ansible for:
> > - Installing/upgrading/configuring Airflow (there are several airflow
> > roles
> > on git)
> > - Deploying the pipelines
> > - Restarting Airflow webserver/scheduler
> >
> > It would be great to have Airflow manageable from Hadoop cluster managers
> > (Cloudera Manager, Ambari). For this a parcel (for cloudera) should be
> > created and installed. If anyone has done this before, please share the
> > experience!
> >
> > Zsolt
> >
> >
> > 2017-11-15 13:30 GMT+01:00 Andrew Maguire <[email protected]>:
> >
> > > Is there any options at all out there for Airflow as a service type
> > > approach?
> > >
> > > I'd love to just be able to define my dags and load them to some cloud
> ui
> > > and not have to worry about anything else.
> > >
> > > This looks kinda interesting -
> > > http://docs.qubole.com/en/latest/user-guide/airflow/
> > > introduction-airflow.html
> > >
> > > Cheers,
> > > Andy
> > >
> > > On Wed, Nov 15, 2017 at 10:28 AM Driesprong, Fokko
> <[email protected]
> > >
> > > wrote:
> > >
> > > > I'm using Ansible to deploy the Airflow, the steps are:
> > > > - First install Airflow using pip (or a rc using curl)
> > > > - Do an `airflow version` to trigger the creation of the default
> config
> > > > - Set the config correctly variables in the config using Ansible.
> > > > - Deploy the supervisord files
> > > > - Start everything
> > > >
> > > > A separate role is there to deploy Postgres. But if you are working
> on
> > a
> > > > cloud environment, you can also get Postgres/MySQL as a service. Hope
> > > this
> > > > helps.
> > > >
> > > > Cheers, Fokko
> > > >
> > > > 2017-11-15 3:19 GMT+01:00 Marc Bollinger <[email protected]>:
> > > >
> > > > > Samson <https://github.com/zendesk/samson> deploy that runs a
> script
> > > > > running a Broadside <https://github.com/lumoslabs/broadside>
> deploy
> > > for
> > > > > ECS, which bounces the Web and Scheduler workers, and updates the
> DAG
> > > > > directory on the workers. Docker images come from a Github ->
> Travis
> > ->
> > > > > Quay
> > > > > <https://quay.io/> CI setup.
> > > > >
> > > > > On Tue, Nov 14, 2017 at 10:18 AM, Alek Storm <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > > Our TeamCity server detects the master branch has changed, then
> > > > packages
> > > > > up
> > > > > > the repo containing our DAGs as an artifact. We then use
> SaltStack
> > to
> > > > > > trigger a bash script on the targeted servers that downloads the
> > > > > artifact,
> > > > > > moves the files to the right place, and restarts the scheduler
> (on
> > > the
> > > > > > master).
> > > > > >
> > > > > > This allows us to easily revert changes by redeploying a
> particular
> > > > > > TeamCity artifact, without touching the git history.
> > > > > >
> > > > > > Alek
> > > > > >
> > > > > > On Nov 14, 2017 11:02 AM, "Andy Hadjigeorgiou" <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey,
> > > > > > >
> > > > > > > Was just wondering what tools & services everyone uses to
> deploy
> > > new
> > > > > > > versions of their data pipelines (understandably this would
> vary
> > > > > greatly
> > > > > > > based on tech stack) but I'd love to hear what the community
> has
> > > been
> > > > > > > using.
> > > > > > >
> > > > > > > - Andy
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 




*Kate-Laurel AgnewData Engineerm: 503-741-9207
<503%20741%209207>e: [email protected]
<https://mail.google.com/mail/?view=cm&fs=1&tf=1&[email protected]>signal.co
<http://www.google.com/url?q=http%3A%2F%2Fsignal.co%2F&sa=D&sntz=1&usg=AFQjCNGG8nAqplB1u72dXPbYfPYRcfopNQ>________________________Cut
Through the NoiseThis e-mail and any files transmitted with it are for the
sole use of the intended recipient(s) and may contain confidential and
privileged information. Any unauthorized use of this email is strictly
prohibited. ©2015 Signal. All rights reserved.*

Reply via email to