We at our company's production ETL, have created a channel of [Git Merge => Use Jenkin Trigger to => Deploy to Airflow deployment machine ] approach. That way, the only code which moves from Git to the airflow dag is the added/ modified DAG.
We have been running this in production for last 3 months and have not run into any issue "yet" This has also helped us to keep a true copy of production on Git. ~Manish On Mon, Nov 6, 2017 at 1:07 PM, Daniel Imberman <[email protected]> wrote: > +1 for this conversation. > > I know that most of the production airflow instances basically just have a > policy of "don't update the DAG files while a job is running." > > One thing that is difficult with this, however, is that for CeleryExecutor > and KubernetesExecutor we don't really have any power over the DAG > refreshes. If you're storing your DAGs in s3 or NFS, we can't stop or > trigger a refresh of the DAGs. I'd be interested to see what others have > done for this and if there's anything we can do to standardize this. > > On Mon, Nov 6, 2017 at 12:34 PM Gaetan Semet <[email protected]> wrote: > > > Hello > > > > I am working with Airflow to see how we can use it in my company, and I > > volunteer to help you if you need help on some parts. I used to work a > lot > > with Python and Twisted, but real, distributed scheduling is kind of a > new > > sport for me. > > > > I see that deploying DAGs regularly is not as easy as we can imagine. I > > started playing with git-sync and apparently it is not recommended in > prod > > since it can lead to an incoherent state if the scheduler is refreshed in > > the middle of the execution. But DAGs lives and they can be updated by > > users and I think Airflow needs a way to allow automatic refresh of the > > DAGs without having to stop the scheduler. > > > > Does anyone already works on it, or do you have a set of JIRA ticket > > covering this issue so I can start working on it ? > > > > Best Regards, > > Gaetan Semet > > >
