michaelmicheal opened a new pull request, #25725:
URL: https://github.com/apache/airflow/pull/25725

   ---
   Similar to [this issue](https://github.com/apache/airflow/issues/25627) and 
the [associated PR](https://github.com/apache/airflow/pull/25673), we are 
seeing MySQL is not using the `ti_state` index and completing a full-table scan 
on the `task_instance` table on the `find_zombies` query when the `ti_state` 
index should be used to find tasks in `running` state.
   ```python
   zombies = (
               session.query(TaskInstance, DagModel.fileloc)
               .join(LocalTaskJob, TaskInstance.job_id == LocalTaskJob.id)
               .join(DagModel, TaskInstance.dag_id == DagModel.dag_id)
               .filter(TaskInstance.state == TaskInstanceState.RUNNING)
               .filter(
                   or_(
                       LocalTaskJob.state != State.RUNNING,
                       LocalTaskJob.latest_heartbeat < limit_dttm,
                   )
               )
               .filter(TaskInstance.queued_by_job_id == self.id)
               .all()
           )
   ```
   
   Adding a MySQL index hint resolved this issue for us, so this PR adds an 
index hint on the above query.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to