SavingFrame commented on issue #57344:
URL: https://github.com/apache/airflow/issues/57344#issuecomment-3481291519

   Well, I tested null_furst/null_last indexes in the postgres
   We want to have null first when we order by `dag_run.start_date DESC`
   1. 
   Index creation: `CREATE INDEX test_date_start ON dag_run (start_date NULLS 
FIRST);`
   Query with WHEN CASE - slow
   Query without WHEN CASE - slow as well
   This is because we create the ASC index with null from the beginning, and 
Postgres simply does not use the index because we want to get DESC start_date 
with null first. So postgres can't reverse index, postgres can't use index 
directly.
   2.  
   Index creation: `CREATE INDEX test_date_start ON dag_run (start_date DESC 
NULLS FIRST);`
   Which is actually: ` CREATE INDEX test_date_start ON dag_run (start_date 
DESC);` because NULL first in the desc by default in the postgres.
   So we can just create index and do not specify ordering it will be the same.
   `CREATE INDEX test_date_start ON dag_run (start_date);` 
   Query with WHEN CASE - slow, doesn't use index
   Query without WHEN CASE - fast, does use index
   
   In MySQL, NULLs are always treated as the lowest values, so it won't work 
there. Seems there's no solution how to use index for sorting and make NULL 
first DESC in mysql. (I didn't test mysql)
   
   So I'd suggest to:
   1. Remove expression null_first/null_last from the ordering there
   
https://github.com/apache/airflow/blob/6b3435e9f3fdb67d963217e40ca30a92ba74d619/airflow-core/src/airflow/api_fastapi/common/parameters.py#L251-L253
   2. Add default index on start_date in the dag_run table.
   `CREATE INDEX test_date_start ON dag_run (start_date);` 
   
   <details><summary>Tests results:</summary>
   
   ```
   DROP INDEX test_date_start;
   CREATE INDEX test_date_start ON dag_run (start_date);
   
   EXPLAIN ANALYSE
   (SELECT dag_run.state,
           dag_run.id,
           dag_run.dag_id,
           dag_run.queued_at,
           dag_run.logical_date,
           dag_run.start_date,
           dag_run.end_date,
           dag_run.run_id,
           dag_run.creating_job_id,
           dag_run.run_type,
           dag_run.triggered_by,
           dag_run.triggering_user_name,
           dag_run.conf,
           dag_run.data_interval_start,
           dag_run.data_interval_end,
           dag_run.run_after,
           dag_run.last_scheduling_decision,
           dag_run.log_template_id,
           dag_run.updated_at,
           dag_run.clear_number,
           dag_run.backfill_id,
           dag_run.bundle_version,
           dag_run.scheduled_by_job_id,
           dag_run.context_carrier,
           dag_run.span_status,
           dag_run.created_dag_version_id
    FROM dag_run
    ORDER BY dag_run.start_date DESC
    LIMIT 50 OFFSET 0);
   
   -- Limit  (cost=0.43..4.29 rows=50 width=1352) (actual time=0.010..0.109 
rows=50 loops=1)
   --   ->  Index Scan Backward using test_date_start on dag_run  
(cost=0.43..529828.65 rows=6872051 width=1352) (actual time=0.010..0.104 
rows=50 loops=1)
   -- Planning Time: 0.100 ms
   -- Execution Time: 0.130 ms
   
   
   SELECT dag_run.state,
           dag_run.start_date FROM dag_run
    ORDER BY dag_run.start_date DESC
    LIMIT 1 OFFSET 0
   -- results:
   -- queued, null
   --- null start_date is firsts
   ```
   
   </details> 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to