Thanks for your responses guys!

Chris, I went to WePay's last meet up, lots of fun :-), your blog post is
*really* useful to understand how other companies are using Airflow, thanks
for sharing!

Jeremiah, thanks for pointing out your git-sync, I will take a closer look
later today.

I think for the moment I will stick to a cron based git pull, trying to
keep things simple for now.

Thank you guys!
-Fernando

On Wed, Jul 13, 2016 at 11:42 AM, Jeremiah Lowin <[email protected]> wrote:

> I have a little module for this that was designed to facilitate syncing a
> git repo in Kubernetes: https://github.com/jlowin/git-sync. The idea is to
> sync a volume that is then shared to all containers (webserver, scheduler,
> workers, etc). It also works locally.
>
> However I want to stress that this is absolutely 100% unsupported (by me)!
> It's an experiment that's works well enough for my use case. Maybe it's a
> useful jumping off point?
>
> Best,
> J
> On Wed, Jul 13, 2016 at 2:35 PM Chris Riccomini <[email protected]>
> wrote:
>
> > Most folks follow a push-based approach (puppet, chef, etc).
> >
> > Our approach is CRON-based pull, described here:
> >
> > https://wecode.wepay.com/posts/airflow-wepay
> >
> > On Wed, Jul 13, 2016 at 11:10 AM, Fernando San Martin <[email protected]
> >
> > wrote:
> > > At Turo we have our data pipeline organized as a set of Python & SQL
> jobs
> > > orchestrated by Jenkins. We are evaluating Airflow as an alternative
> and
> > we
> > > have managed to get quite far but we have some questions that we were
> > > hoping to get help with from the community.
> > >
> > > We have a set-up with a master node and two workers, our code was
> > deployed
> > > in all three boxes by retrieving from a git repo. The code in the repo
> > > changes on a regular basis and we need to keep the boxes with the
> latest
> > > version of the code.
> > >
> > > We first thought of adding to the top of our DAGs a BashOperator task
> > that
> > > simply runs `git pull origin master`, but since this code gets only
> > > executed in the workers, the master node will eventually differ from
> the
> > > code that is on the workers.
> > >
> > > Another option is to run a cron job that executes `git pull origin
> > master`
> > > in each box every 5-mins or so.
> > >
> > > Are there recommendations or best practices on how to handle this
> > situation?
> > >
> > > Thank you!
> >
>

Reply via email to