kckoh commented on issue #62025:
URL: https://github.com/apache/airflow/issues/62025#issuecomment-3925235289

   I also reproduced the 200k DagRuns on a single dag but i didn't get the slow 
query as pierrejeambrun got on the screen recording. The response was in a 
range between 300ms -1s. However, i looked into the query and it looks like :
   `
   SELECT
       dr.*,
       d.*,
       drn.*
   FROM dag_run dr
   LEFT JOIN dag d
       ON dr.dag_id = d.dag_id
   LEFT JOIN dag_run_note drn
       ON dr.id = drn.dag_run_id
   WHERE dr.dag_id IN (<list_of_example_dag_ids>)
   ORDER BY dr.run_after DESC, dr.id DESC
   LIMIT 50 OFFSET 0;
   `
   
   The DagRun data model has several indexes
   `
   Index("dag_id_state", dag_id, _state),
   Index("idx_dag_run_dag_id", dag_id),
   Index("idx_dag_run_run_after", run_after),
   Index(
               "idx_dag_run_running_dags",
               "state",
               "dag_id",
               postgresql_where=text("state='running'"),
               sqlite_where=text("state='running'"),
           ),
           # since mysql lacks filtered/partial indices, this creates a
           # duplicate index on mysql. Not the end of the world
           Index(
               "idx_dag_run_queued_dags",
               "state",
               "dag_id",
               postgresql_where=text("state='queued'"),
               sqlite_where=text("state='queued'"),
           ),
   `
   In the frontend run_after is being passed as the query param for oder_by and 
the id gets appended on the backend. 
   
   I propose that we could remove Index("idx_dag_run_run_after", run_after) and 
add  Index("idx_dag_run_run_after_id", run_after, id) to cover the order by for 
run_after and the id. 
   Or, we could just add Index("idx_dag_run_run_after", run_after) and leave 
the existing index. 
   
   What do you propose @pierrejeambrun ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to