seelmann opened a new pull request #5511: [AIRFLOW-4797] Fix zombie detection
URL: https://github.com/apache/airflow/pull/5511
 
 
   ### Jira
   
   - [X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title.
     - https://issues.apache.org/jira/browse/AIRFLOW-4797
   
   ### Description
   
   - [X] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Moved query to fetch zombies from `DagFileProcessorManager` to `DagBag` 
class. Changed query to only look for DAGs of the current DAG bag. The query 
now uses index `ti_dag_state` instead of `ti_state`. Removed no longer required 
`zombies` parameters from many function signatures.
    
   The query is now executed on every call to `DagBag.kill_zombies` which is 
called when the DAG file is processed which frequency depends on 
`scheduler_heartbeat_sec` and `processor_poll_interval` (AFAIU). The query is 
faster than the previous one (see also stats below). It's also negligible IMHO 
because during DAG file processing many other queries (DAG runs and task 
instances are created, task instance dependencies are checked) are executed.
   
   Tested on our staging environment (patch applied to Airflow 1.10.3), zombie 
detection works fine, database load is unchanged. Here some stats from 
`pg_stat_statements`, the branch run there for 4 hours: The new query (1st 
line) is faster but is likely called more frequently. The 2nd line shows stats 
of the old query.
   ```
   select calls,mean_time,max_time,rows from pg_stat_statements where query 
like '%task_instance JOIN job%' and query like '%latest_heartbeat%';
     calls   |     mean_time      |  max_time   | rows 
   ----------+--------------------+-------------+------
       55416 | 0.0260821553522449 |    5.509762 |   29
    71969011 |  0.575755060854888 | 1078.895322 | 2377
   ```
   
   Closed https://github.com/apache/airflow/pull/5420 in favour of this.
   
   ### Tests
   
   - [X] My PR adds the following unit tests
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
     - All the public functions and the classes in the PR contain docstrings 
that explain what it does
     - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [X] Passes `flake8`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to