This was a question I put in a survey I once conducted. The survey is available here (including the individual results at the bottom):
https://cwiki.apache.org/confluence/display/AIRFLOW/Apache+Airflow+survey+2017-06-24 1. I recommend not to use different versions as in separate directories, but to use branching for that purpose instead. This reduces maintenance on your side and helps you to avoid braindead "copy/pastes" between your files. For example, in github you can create directory-like structures to indicate the purpose of a branch: feature/JIRA-3523 hotfix/JIRA-2311 develop Then all you need is a flow describing how code propagates through this. A (usually) successful git workflow is available here: http://nvie.com/posts/a-successful-git-branching-model/ What I have seen done is that instead of creating a separate physical branch for the "acceptance" environment is a process on your CI environment that checks out master and merges all "feature/XXX" branches into that and then deploys it. The downside of that approach is that you do not necessarily know all of the time which code is running on acceptance. 2. If you have an environment that deploys immediately from github when a change happens, then you can just revert the PR. This means that you need to make decisions on how to structure your git flow. As in "1", the more isolated you keep developed work in the entire flow going up to prod, the better your ability to revert changes related to a unit of work. When you merge into a physical acceptance branch for example, then maybe your hotfixes have to revert from there and propagate upwards sincei t usually gets mixed with other units of work (unless you revert manually in a hotfix branch). Alternatively if you deploy your dags as some other package, then simply reverting to another version of the package can fix the problem. This means relying on yum/apt/pip or any other package management instead. Usually these deployment mechanisms add a slight deployment delay in the process, which you may or may not want to deal with. In other environments people use docker/kubernetes as a means to push a new version of a container containing all the dags. Kubernetes for example can perform a rolling update of your new code and easily revert backwards. Because ETL is usually batch oriented, rollbacks aren't necessarily always required and not as time-sensitive as a website or a service. I.e... you have more flexibility on choosing the approach towards solving an issue as the time window for resolution is usually higher. You can roll forward towards a fixed version or go with a code revert instead, which means you may not have to implement a package rollback mechanism per sé. Rgds, Gerard On Mon, Jan 22, 2018 at 10:49 AM, Sreenath <[email protected]> wrote: > Are there any best practices that are followed for deploying new dags to > airflow? > > I saw a couple of comments on the google forum stating that the dags are > saved inside a GIT repository and the same is synced periodically to the > local location in the airflow cluster. Regarding this approach, I had a > couple of questions > 1. Do we maintain separate dag files for separate environments? (testing. > production) > 2. How to handle rollback of an ETL to an older version in case the new > version has a bug? > > Any help here is highly appreciated. Let me know in case you need any > further details? > > -- > Sreenath S Kamath > Bangalore > Ph No:+91-9590989106 >
