We at our company's production  ETL, have created a channel of

[Git  Merge =>  Use Jenkin Trigger to =>   Deploy to Airflow deployment
machine ] approach. That way, the only code which moves from Git to the
airflow dag is the added/ modified DAG.

We have been running this in production  for last 3 months and have not run
into any issue "yet"

This has also helped us to keep a true copy of production on Git.




~Manish



On Mon, Nov 6, 2017 at 1:07 PM, Daniel Imberman <[email protected]>
wrote:

> +1 for this conversation.
>
> I know that most of the production airflow instances basically just have a
> policy of "don't update the DAG files while a job is running."
>
> One thing that is difficult with this, however, is that for CeleryExecutor
> and KubernetesExecutor we don't really have any power over the DAG
> refreshes. If you're storing your DAGs in s3 or NFS, we can't stop or
> trigger a refresh of the DAGs. I'd be interested to see what others have
> done for this and if there's anything we can do to standardize this.
>
> On Mon, Nov 6, 2017 at 12:34 PM Gaetan Semet <[email protected]> wrote:
>
> > Hello
> >
> > I am working with Airflow to see how we can use it in my company, and I
> > volunteer to help you if you need help on some parts. I used to work a
> lot
> > with Python and Twisted, but real, distributed scheduling is kind of a
> new
> > sport for me.
> >
> > I see that deploying DAGs regularly is not as easy as we can imagine. I
> > started playing with git-sync and apparently it is not recommended in
> prod
> > since it can lead to an incoherent state if the scheduler is refreshed in
> > the middle of the execution. But DAGs lives and they can be updated by
> > users and I think Airflow needs a way to allow automatic refresh of the
> > DAGs without having to stop the scheduler.
> >
> > Does anyone already works on it, or do you have a set of JIRA ticket
> > covering this issue so I can start working on it ?
> >
> > Best Regards,
> > Gaetan Semet
> >
>

Reply via email to