Most folks follow a push-based approach (puppet, chef, etc).

Our approach is CRON-based pull, described here:

https://wecode.wepay.com/posts/airflow-wepay

On Wed, Jul 13, 2016 at 11:10 AM, Fernando San Martin <[email protected]> wrote:
> At Turo we have our data pipeline organized as a set of Python & SQL jobs
> orchestrated by Jenkins. We are evaluating Airflow as an alternative and we
> have managed to get quite far but we have some questions that we were
> hoping to get help with from the community.
>
> We have a set-up with a master node and two workers, our code was deployed
> in all three boxes by retrieving from a git repo. The code in the repo
> changes on a regular basis and we need to keep the boxes with the latest
> version of the code.
>
> We first thought of adding to the top of our DAGs a BashOperator task that
> simply runs `git pull origin master`, but since this code gets only
> executed in the workers, the master node will eventually differ from the
> code that is on the workers.
>
> Another option is to run a cron job that executes `git pull origin master`
> in each box every 5-mins or so.
>
> Are there recommendations or best practices on how to handle this situation?
>
> Thank you!

Reply via email to