Thanks for your responses guys! Chris, I went to WePay's last meet up, lots of fun :-), your blog post is *really* useful to understand how other companies are using Airflow, thanks for sharing!
Jeremiah, thanks for pointing out your git-sync, I will take a closer look later today. I think for the moment I will stick to a cron based git pull, trying to keep things simple for now. Thank you guys! -Fernando On Wed, Jul 13, 2016 at 11:42 AM, Jeremiah Lowin <[email protected]> wrote: > I have a little module for this that was designed to facilitate syncing a > git repo in Kubernetes: https://github.com/jlowin/git-sync. The idea is to > sync a volume that is then shared to all containers (webserver, scheduler, > workers, etc). It also works locally. > > However I want to stress that this is absolutely 100% unsupported (by me)! > It's an experiment that's works well enough for my use case. Maybe it's a > useful jumping off point? > > Best, > J > On Wed, Jul 13, 2016 at 2:35 PM Chris Riccomini <[email protected]> > wrote: > > > Most folks follow a push-based approach (puppet, chef, etc). > > > > Our approach is CRON-based pull, described here: > > > > https://wecode.wepay.com/posts/airflow-wepay > > > > On Wed, Jul 13, 2016 at 11:10 AM, Fernando San Martin <[email protected] > > > > wrote: > > > At Turo we have our data pipeline organized as a set of Python & SQL > jobs > > > orchestrated by Jenkins. We are evaluating Airflow as an alternative > and > > we > > > have managed to get quite far but we have some questions that we were > > > hoping to get help with from the community. > > > > > > We have a set-up with a master node and two workers, our code was > > deployed > > > in all three boxes by retrieving from a git repo. The code in the repo > > > changes on a regular basis and we need to keep the boxes with the > latest > > > version of the code. > > > > > > We first thought of adding to the top of our DAGs a BashOperator task > > that > > > simply runs `git pull origin master`, but since this code gets only > > > executed in the workers, the master node will eventually differ from > the > > > code that is on the workers. > > > > > > Another option is to run a cron job that executes `git pull origin > > master` > > > in each box every 5-mins or so. > > > > > > Are there recommendations or best practices on how to handle this > > situation? > > > > > > Thank you! > > >
