+1 for this conversation. I know that most of the production airflow instances basically just have a policy of "don't update the DAG files while a job is running."
One thing that is difficult with this, however, is that for CeleryExecutor and KubernetesExecutor we don't really have any power over the DAG refreshes. If you're storing your DAGs in s3 or NFS, we can't stop or trigger a refresh of the DAGs. I'd be interested to see what others have done for this and if there's anything we can do to standardize this. On Mon, Nov 6, 2017 at 12:34 PM Gaetan Semet <[email protected]> wrote: > Hello > > I am working with Airflow to see how we can use it in my company, and I > volunteer to help you if you need help on some parts. I used to work a lot > with Python and Twisted, but real, distributed scheduling is kind of a new > sport for me. > > I see that deploying DAGs regularly is not as easy as we can imagine. I > started playing with git-sync and apparently it is not recommended in prod > since it can lead to an incoherent state if the scheduler is refreshed in > the middle of the execution. But DAGs lives and they can be updated by > users and I think Airflow needs a way to allow automatic refresh of the > DAGs without having to stop the scheduler. > > Does anyone already works on it, or do you have a set of JIRA ticket > covering this issue so I can start working on it ? > > Best Regards, > Gaetan Semet >
