SavingFrame commented on issue #57344:
URL: https://github.com/apache/airflow/issues/57344#issuecomment-3450742830
I checked, and simply creating an index does not help, because we use
CASE/WHEN in the ordering.
A few results:
1. Index for the start_date without CASE/WHEN
Query
```sql
EXPLAIN ANALYSE (
SELECT dag_run.state,
dag_run.id,
dag_run.dag_id,
dag_run.queued_at,
dag_run.logical_date,
dag_run.start_date,
dag_run.end_date,
dag_run.run_id,
dag_run.creating_job_id,
dag_run.run_type,
dag_run.triggered_by,
dag_run.triggering_user_name,
dag_run.conf,
dag_run.data_interval_start,
dag_run.data_interval_end,
dag_run.run_after,
dag_run.last_scheduling_decision,
dag_run.log_template_id,
dag_run.updated_at,
dag_run.clear_number,
dag_run.backfill_id,
dag_run.bundle_version,
dag_run.scheduled_by_job_id,
dag_run.context_carrier,
dag_run.span_status,
dag_run.created_dag_version_id
FROM dag_run
ORDER BY dag_run.start_date DESC, dag_run.id DESC
LIMIT 50 OFFSET 0
)
```
Result:
```
Limit (cost=0.52..6.60 rows=50 width=1956) (actual time=67.118..67.128
rows=50 loops=1)
-> Incremental Sort (cost=0.52..835313.39 rows=6872051 width=1956)
(actual time=67.117..67.122 rows=50 loops=1)
" Sort Key: start_date DESC, id DESC"
Presorted Key: start_date
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 43kB
Peak Memory: 43kB
Pre-sorted Groups: 1 Sort Method: top-N heapsort Average Memory:
47kB Peak Memory: 47kB
-> Index Scan Backward using test_dag_run_start_date_index on
dag_run (cost=0.43..526159.29 rows=6872051 width=1956) (actual
time=14.672..66.020 rows=2672 loops=1)
Planning Time: 0.121 ms
Execution Time: 67.161 ms
```
2. Index for the start_date with CASE/WHEN in the ordering
Query
```sql
EXPLAIN ANALYSE (
SELECT dag_run.state,
dag_run.id,
dag_run.dag_id,
dag_run.queued_at,
dag_run.logical_date,
dag_run.start_date,
dag_run.end_date,
dag_run.run_id,
dag_run.creating_job_id,
dag_run.run_type,
dag_run.triggered_by,
dag_run.triggering_user_name,
dag_run.conf,
dag_run.data_interval_start,
dag_run.data_interval_end,
dag_run.run_after,
dag_run.last_scheduling_decision,
dag_run.log_template_id,
dag_run.updated_at,
dag_run.clear_number,
dag_run.backfill_id,
dag_run.bundle_version,
dag_run.scheduled_by_job_id,
dag_run.context_carrier,
dag_run.span_status,
dag_run.created_dag_version_id
FROM dag_run
ORDER BY CASE WHEN (dag_run.start_date IS NOT NULL) THEN 0 ELSE 1 END,
dag_run.start_date DESC, dag_run.id DESC
LIMIT 50 OFFSET 0
)
```
Result:
```
Limit (cost=524994.16..525000.00 rows=50 width=1960) (actual
time=3607.433..3607.541 rows=50 loops=1)
-> Gather Merge (cost=524994.16..1193156.98 rows=5726710 width=1960)
(actual time=3607.431..3607.535 rows=50 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=523994.14..531152.53 rows=2863355 width=1960)
(actual time=3575.291..3575.294 rows=34 loops=3)
" Sort Key: (CASE WHEN (start_date IS NOT NULL) THEN 0 ELSE 1
END), start_date DESC, id DESC"
Sort Method: top-N heapsort Memory: 48kB
Worker 0: Sort Method: top-N heapsort Memory: 58kB
Worker 1: Sort Method: top-N heapsort Memory: 60kB
-> Parallel Seq Scan on dag_run (cost=0.00..428875.55
rows=2863355 width=1960) (actual time=0.006..1787.576 rows=2290684 loops=3)
Planning Time: 0.112 ms
Execution Time: 3607.587 ms
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]