SavingFrame commented on issue #57344:
URL: https://github.com/apache/airflow/issues/57344#issuecomment-3450742830

   I checked, and simply creating an index does not help, because we use 
CASE/WHEN in the ordering.
   
   A few results:
   1. Index for the start_date without CASE/WHEN
   Query
   ```sql
   EXPLAIN ANALYSE (
   SELECT dag_run.state,
          dag_run.id,
          dag_run.dag_id,
          dag_run.queued_at,
          dag_run.logical_date,
          dag_run.start_date,
          dag_run.end_date,
          dag_run.run_id,
          dag_run.creating_job_id,
          dag_run.run_type,
          dag_run.triggered_by,
          dag_run.triggering_user_name,
          dag_run.conf,
          dag_run.data_interval_start,
          dag_run.data_interval_end,
          dag_run.run_after,
          dag_run.last_scheduling_decision,
          dag_run.log_template_id,
          dag_run.updated_at,
          dag_run.clear_number,
          dag_run.backfill_id,
          dag_run.bundle_version,
          dag_run.scheduled_by_job_id,
          dag_run.context_carrier,
          dag_run.span_status,
          dag_run.created_dag_version_id
   FROM dag_run
   ORDER BY dag_run.start_date DESC, dag_run.id DESC
   LIMIT 50 OFFSET 0
   )
   ```
   Result: 
   ```
   Limit  (cost=0.52..6.60 rows=50 width=1956) (actual time=67.118..67.128 
rows=50 loops=1)
     ->  Incremental Sort  (cost=0.52..835313.39 rows=6872051 width=1956) 
(actual time=67.117..67.122 rows=50 loops=1)
   "        Sort Key: start_date DESC, id DESC"
           Presorted Key: start_date
           Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 43kB  
Peak Memory: 43kB
           Pre-sorted Groups: 1  Sort Method: top-N heapsort  Average Memory: 
47kB  Peak Memory: 47kB
           ->  Index Scan Backward using test_dag_run_start_date_index on 
dag_run  (cost=0.43..526159.29 rows=6872051 width=1956) (actual 
time=14.672..66.020 rows=2672 loops=1)
   Planning Time: 0.121 ms
   Execution Time: 67.161 ms
   ```
   2. Index for the start_date with CASE/WHEN in the ordering
   Query
   ```sql
   EXPLAIN ANALYSE (
   SELECT dag_run.state,
          dag_run.id,
          dag_run.dag_id,
          dag_run.queued_at,
          dag_run.logical_date,
          dag_run.start_date,
          dag_run.end_date,
          dag_run.run_id,
          dag_run.creating_job_id,
          dag_run.run_type,
          dag_run.triggered_by,
          dag_run.triggering_user_name,
          dag_run.conf,
          dag_run.data_interval_start,
          dag_run.data_interval_end,
          dag_run.run_after,
          dag_run.last_scheduling_decision,
          dag_run.log_template_id,
          dag_run.updated_at,
          dag_run.clear_number,
          dag_run.backfill_id,
          dag_run.bundle_version,
          dag_run.scheduled_by_job_id,
          dag_run.context_carrier,
          dag_run.span_status,
          dag_run.created_dag_version_id
   FROM dag_run
   ORDER BY CASE WHEN (dag_run.start_date IS NOT NULL) THEN 0 ELSE 1 END, 
dag_run.start_date DESC, dag_run.id DESC
   LIMIT 50 OFFSET 0
   )
   ```
   Result:
   ```
   Limit  (cost=524994.16..525000.00 rows=50 width=1960) (actual 
time=3607.433..3607.541 rows=50 loops=1)
     ->  Gather Merge  (cost=524994.16..1193156.98 rows=5726710 width=1960) 
(actual time=3607.431..3607.535 rows=50 loops=1)
           Workers Planned: 2
           Workers Launched: 2
           ->  Sort  (cost=523994.14..531152.53 rows=2863355 width=1960) 
(actual time=3575.291..3575.294 rows=34 loops=3)
   "              Sort Key: (CASE WHEN (start_date IS NOT NULL) THEN 0 ELSE 1 
END), start_date DESC, id DESC"
                 Sort Method: top-N heapsort  Memory: 48kB
                 Worker 0:  Sort Method: top-N heapsort  Memory: 58kB
                 Worker 1:  Sort Method: top-N heapsort  Memory: 60kB
                 ->  Parallel Seq Scan on dag_run  (cost=0.00..428875.55 
rows=2863355 width=1960) (actual time=0.006..1787.576 rows=2290684 loops=3)
   Planning Time: 0.112 ms
   Execution Time: 3607.587 ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to