SavingFrame commented on issue #57344: URL: https://github.com/apache/airflow/issues/57344#issuecomment-3481291519
Well, I tested null_furst/null_last indexes in the postgres We want to have null first when we order by `dag_run.start_date DESC` 1. Index creation: `CREATE INDEX test_date_start ON dag_run (start_date NULLS FIRST);` Query with WHEN CASE - slow Query without WHEN CASE - slow as well This is because we create the ASC index with null from the beginning, and Postgres simply does not use the index because we want to get DESC start_date with null first. So postgres can't reverse index, postgres can't use index directly. 2. Index creation: `CREATE INDEX test_date_start ON dag_run (start_date DESC NULLS FIRST);` Which is actually: ` CREATE INDEX test_date_start ON dag_run (start_date DESC);` because NULL first in the desc by default in the postgres. So we can just create index and do not specify ordering it will be the same. `CREATE INDEX test_date_start ON dag_run (start_date);` Query with WHEN CASE - slow, doesn't use index Query without WHEN CASE - fast, does use index In MySQL, NULLs are always treated as the lowest values, so it won't work there. Seems there's no solution how to use index for sorting and make NULL first DESC in mysql. (I didn't test mysql) So I'd suggest to: 1. Remove expression null_first/null_last from the ordering there https://github.com/apache/airflow/blob/6b3435e9f3fdb67d963217e40ca30a92ba74d619/airflow-core/src/airflow/api_fastapi/common/parameters.py#L251-L253 2. Add default index on start_date in the dag_run table. `CREATE INDEX test_date_start ON dag_run (start_date);` <details><summary>Tests results:</summary> ``` DROP INDEX test_date_start; CREATE INDEX test_date_start ON dag_run (start_date); EXPLAIN ANALYSE (SELECT dag_run.state, dag_run.id, dag_run.dag_id, dag_run.queued_at, dag_run.logical_date, dag_run.start_date, dag_run.end_date, dag_run.run_id, dag_run.creating_job_id, dag_run.run_type, dag_run.triggered_by, dag_run.triggering_user_name, dag_run.conf, dag_run.data_interval_start, dag_run.data_interval_end, dag_run.run_after, dag_run.last_scheduling_decision, dag_run.log_template_id, dag_run.updated_at, dag_run.clear_number, dag_run.backfill_id, dag_run.bundle_version, dag_run.scheduled_by_job_id, dag_run.context_carrier, dag_run.span_status, dag_run.created_dag_version_id FROM dag_run ORDER BY dag_run.start_date DESC LIMIT 50 OFFSET 0); -- Limit (cost=0.43..4.29 rows=50 width=1352) (actual time=0.010..0.109 rows=50 loops=1) -- -> Index Scan Backward using test_date_start on dag_run (cost=0.43..529828.65 rows=6872051 width=1352) (actual time=0.010..0.104 rows=50 loops=1) -- Planning Time: 0.100 ms -- Execution Time: 0.130 ms SELECT dag_run.state, dag_run.start_date FROM dag_run ORDER BY dag_run.start_date DESC LIMIT 1 OFFSET 0 -- results: -- queued, null --- null start_date is firsts ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
