There was some discussion around this, as mentioned by Vince, when jlowin proposed a fix : https://github.com/apache/incubator-airflow/pull/1344
It's really funny (sad) that deleting a dag is so counter-intuitive. We (Agari) deploy by essentially pointing Airflow's dags_folder (specified in airflow.cfg) to a git repo. To delete a DAG, we : 1. *git rm* the dag 2. git update the git repo on all airflow machines - machines running airflow webservers & schedulers 3. Restart the webserver and scheduler That does the trick. We run the local scheduler. If you run celery, you probably have to add the celery worker processes to the list in steps 2 and 3 Dags are identified by name (and I believe also by a path on the file system) so if a dag with the same name were to ever be checked back into git, Airflow would resurrect the dag based on what's in the db. Unfortunately, it will keep the original start date and much of the history. The result is that the scheduler will start filling in for times since it was "deleted", doing a potentially expensive backfill. Now, parts of the code, like the scheduler, periodically reparse the dags in the dag folder.. so, simply deleting a record of it from the DB will not suffice because the next reparse will resurrect it. The reparsing happens every few minutes. However, cleaning up the DB is important as well to avoid a conflict when you do want to reload a dag that was previously active. We (Agari) and Airbnb and a lot of other users depend on git to distribute a dag to airflow machines, hence the deletion of a dag also depends on git. This is an unspecified design pattern/dependency of running Airflow. More plainly, we could just say that airflow depends on some distributed file system for distributing dags. One way to decouple the deletion of dags from its distribution is to write a "tombstone" in a new tombstone table. The tombstone could act as an "ignore this dag" filter and could be applied during dag parsing. We could also generate a hash of the contents of the file, so if someone were to pick the same dag_id as a previously deleted one, if the hashes were different, then the tombstone would not match - we would match tombstones based on the dag_id+hash. There is a challenge around tombstone expiration. The reason is that airflow would not know the details of its dag_folder's file system. Is it git, cvs, svn, nfs, etc.. hence, which command should be used to move or delete the dag file permanently? Until the file could be removed from the file system, the tombstone could not be expired. My proposal here is to keep tombstones around until the user did the necessary cleanup himself/herself. Airflow could check periodically for cleanups of the dag file and remove tombstones and any rows in tables at that time as well. I feel there is enough solutioninzing in this email and the PR conversation preceding it to welcome an implementation of this fix from the community. If you have some time, please implement this and send a PR. -s On Fri, Aug 26, 2016 at 4:34 PM, Lance Norskog <[email protected]> wrote: > This is for the data model as of March 2016. I haven't tried it lately. > Wrap in a transaction. > > For MySQL: > > set @dag_id = 'BAD_DAG'; > delete from airflow.xcom where dag_id = @dag_id; > delete from airflow.task_instance where dag_id = @dag_id; > delete from airflow.sla_miss where dag_id = @dag_id; > delete from airflow.log where dag_id = @dag_id; > delete from airflow.job where dag_id = @dag_id; > delete from airflow.dag_run where dag_id = @dag_id; > delete from airflow.dag where dag_id = @dag_id; > > > On Thu, Aug 25, 2016 at 8:57 PM, Vince Reuter <[email protected]> > wrote: > > > Hey Jason, I think it's an open PR https://github.com/apache/ > > incubator-airflow/pull/1344 > > > > -Vince > > > > > On Aug 25, 2016, at 8:35 PM, Jason Chen <[email protected]> > > wrote: > > > > > > Hi, > > > > > > How to delete a dag in airflow (instead of turning it off ) ? > > > > > > Thanks. > > > > > > Jason > > > > > > -- > Lance Norskog > [email protected] > Redwood City, CA >
