At Turo we have our data pipeline organized as a set of Python & SQL jobs orchestrated by Jenkins. We are evaluating Airflow as an alternative and we have managed to get quite far but we have some questions that we were hoping to get help with from the community.
We have a set-up with a master node and two workers, our code was deployed in all three boxes by retrieving from a git repo. The code in the repo changes on a regular basis and we need to keep the boxes with the latest version of the code. We first thought of adding to the top of our DAGs a BashOperator task that simply runs `git pull origin master`, but since this code gets only executed in the workers, the master node will eventually differ from the code that is on the workers. Another option is to run a cron job that executes `git pull origin master` in each box every 5-mins or so. Are there recommendations or best practices on how to handle this situation? Thank you!
