Most folks follow a push-based approach (puppet, chef, etc). Our approach is CRON-based pull, described here:
https://wecode.wepay.com/posts/airflow-wepay On Wed, Jul 13, 2016 at 11:10 AM, Fernando San Martin <[email protected]> wrote: > At Turo we have our data pipeline organized as a set of Python & SQL jobs > orchestrated by Jenkins. We are evaluating Airflow as an alternative and we > have managed to get quite far but we have some questions that we were > hoping to get help with from the community. > > We have a set-up with a master node and two workers, our code was deployed > in all three boxes by retrieving from a git repo. The code in the repo > changes on a regular basis and we need to keep the boxes with the latest > version of the code. > > We first thought of adding to the top of our DAGs a BashOperator task that > simply runs `git pull origin master`, but since this code gets only > executed in the workers, the master node will eventually differ from the > code that is on the workers. > > Another option is to run a cron job that executes `git pull origin master` > in each box every 5-mins or so. > > Are there recommendations or best practices on how to handle this situation? > > Thank you!
