Re: Efficient Way to Deploy Dags On Airflow

Gerard Toonstra Mon, 22 Jan 2018 11:19:00 -0800

This was a question I put in a survey I once conducted. The survey is
available here (including the individual results at the bottom):


https://cwiki.apache.org/confluence/display/AIRFLOW/Apache+Airflow+survey+2017-06-24

1. I recommend not to use different versions as in separate directories,
but to use branching for that purpose instead.
    This reduces maintenance on your side and helps you to avoid braindead
"copy/pastes" between your files.
    For example, in github you can create directory-like structures to
indicate the purpose of a branch:

           feature/JIRA-3523
           hotfix/JIRA-2311
           develop

      Then all you need is a flow describing how code propagates through
this. A (usually) successful git workflow is available here:

       http://nvie.com/posts/a-successful-git-branching-model/

     What I have seen done is that instead of creating a separate physical
branch for the "acceptance"  environment is a process on your
     CI environment that checks out master and merges all "feature/XXX"
branches into that and then deploys it. The downside of that approach
     is that you do not necessarily know all of the time which code is
running on acceptance.

2. If you have an environment that deploys immediately from github when a
change happens, then you can just revert the PR.
    This means that you need to make decisions on how to structure your git
flow. As in "1", the more isolated you keep developed work
     in the entire flow going up to prod, the better your ability to revert
changes related to a unit of work. When you merge into a physical
     acceptance branch for example, then maybe your hotfixes have to revert
from there and propagate upwards sincei t usually gets mixed
    with other units of work (unless you revert manually in a hotfix
branch).

     Alternatively if you deploy your dags as some other package, then
simply reverting to another version of the package can fix the problem.
     This means relying on yum/apt/pip or any other package management
instead. Usually these deployment mechanisms add a slight
     deployment delay in the process,  which you may or may not want to
deal with.

In other environments people use docker/kubernetes as a means to push a new
version of a container containing all the dags.
Kubernetes for example can perform a rolling update of your new code and
easily revert backwards.

Because ETL is usually batch oriented, rollbacks aren't necessarily always
required and not as time-sensitive as a website or a service.
I.e... you have more flexibility on choosing the approach towards solving
an issue as the time window for resolution is usually higher.
You can roll forward towards a fixed version or go with a code revert
instead, which means you may not have to implement a package
rollback mechanism per sé.

Rgds,

Gerard


On Mon, Jan 22, 2018 at 10:49 AM, Sreenath <[email protected]> wrote:

> Are there any best practices that are followed for deploying new dags to
> airflow?
>
> I saw a couple of comments on the google forum stating that the dags are
> saved inside a GIT repository and the same is synced periodically to the
> local location in the airflow cluster.    Regarding this approach, I had a
> couple of questions
>   1. Do we maintain separate dag files for separate environments? (testing.
> production)
>   2. How to handle rollback of an ETL to an older version in case the new
> version has a bug?
>
> Any help here is highly appreciated. Let me know in case you need any
> further details?
>
> --
> Sreenath S Kamath
> Bangalore
> Ph No:+91-9590989106
>

Re: Efficient Way to Deploy Dags On Airflow

Reply via email to