seelmann opened a new pull request #5511: [AIRFLOW-4797] Fix zombie detection URL: https://github.com/apache/airflow/pull/5511 ### Jira - [X] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. - https://issues.apache.org/jira/browse/AIRFLOW-4797 ### Description - [X] Here are some details about my PR, including screenshots of any UI changes: Moved query to fetch zombies from `DagFileProcessorManager` to `DagBag` class. Changed query to only look for DAGs of the current DAG bag. The query now uses index `ti_dag_state` instead of `ti_state`. Removed no longer required `zombies` parameters from many function signatures. The query is now executed on every call to `DagBag.kill_zombies` which is called when the DAG file is processed which frequency depends on `scheduler_heartbeat_sec` and `processor_poll_interval` (AFAIU). The query is faster than the previous one (see also stats below). It's also negligible IMHO because during DAG file processing many other queries (DAG runs and task instances are created, task instance dependencies are checked) are executed. Tested on our staging environment (patch applied to Airflow 1.10.3), zombie detection works fine, database load is unchanged. Here some stats from `pg_stat_statements`, the branch run there for 4 hours: The new query (1st line) is faster but is likely called more frequently. The 2nd line shows stats of the old query. ``` select calls,mean_time,max_time,rows from pg_stat_statements where query like '%task_instance JOIN job%' and query like '%latest_heartbeat%'; calls | mean_time | max_time | rows ----------+--------------------+-------------+------ 55416 | 0.0260821553522449 | 5.509762 | 29 71969011 | 0.575755060854888 | 1078.895322 | 2377 ``` Closed https://github.com/apache/airflow/pull/5420 in favour of this. ### Tests - [X] My PR adds the following unit tests ### Commits - [X] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release ### Code Quality - [X] Passes `flake8`
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
