At Turo we have our data pipeline organized as a set of Python & SQL jobs
orchestrated by Jenkins. We are evaluating Airflow as an alternative and we
have managed to get quite far but we have some questions that we were
hoping to get help with from the community.

We have a set-up with a master node and two workers, our code was deployed
in all three boxes by retrieving from a git repo. The code in the repo
changes on a regular basis and we need to keep the boxes with the latest
version of the code.

We first thought of adding to the top of our DAGs a BashOperator task that
simply runs `git pull origin master`, but since this code gets only
executed in the workers, the master node will eventually differ from the
code that is on the workers.

Another option is to run a cron job that executes `git pull origin master`
in each box every 5-mins or so.

Are there recommendations or best practices on how to handle this situation?

Thank you!

Reply via email to