turbaszek commented on issue #11302: URL: https://github.com/apache/airflow/issues/11302#issuecomment-705062847
Additionally, this may help us with making backfill runnable remotely. ## The problem Run: ``` airflow dags backfill -v -s 2020-10-06 example_bash_operator # once there's a process running single task, do the following: pkill -9 -f backfil ``` this will result in "zombie" DagRun and related task instance that will not be cleaned up by the scheduler (at least that's my understanding). Example: <img width="2188" alt="Screenshot 2020-10-07 at 18 34 47" src="https://user-images.githubusercontent.com/9528307/95360543-ce73a200-08cb-11eb-8116-552da95fb105.png"> However, querying the the job table we see: <img width="1687" alt="Screenshot 2020-10-07 at 18 39 07" src="https://user-images.githubusercontent.com/9528307/95360957-683b4f00-08cc-11eb-854b-d50eb3fe965e.png"> So, the backfill job is still running according to Airflow state but that's not true as we killed the job 👎 ## Possible solution Link a specific job to DagRun triggered by it (using the `job_id`) and then run a process that will kill the zombies. This can be done either by: - killing a DR (and related TIs) that is in an unfinished state (running, none, scheduled, queued) but the job that was running it is in error state - killing a DR (and related TIs) that is in an unfinished state but the job that was running it didn't heartbeat for the last few minutes (configurable) Cleaning of such zombies can be easily triggered by the scheduler. I think this may bring us closer to triggering backfill via API / UI. WDYT? @ashb @kaxil @potiuk @mik-laj @dimberman ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
