The GitHub Actions job "Tests" on airflow.git has succeeded.
Run started by GitHub user ashb (triggered by ashb).

Head commit for run:
3115a4f6b8d48adac0285f33c84474c5612a7753 / Ash Berlin-Taylor <[email protected]>
Change approach to finding bad rows to LEFT OUTER JOIN.

Rather than two sub-selects,

For a _large_ database (27m TaskInstances, 2m DagRuns) this takes the
time from 10minutes to around 3 minutes per table (we have 3) down to 3
minutes per table.

Before:

```sql
SELECT
    count(*) AS count_1
FROM (
    SELECT
        rendered_task_instance_fields.dag_id AS dag_id,
        rendered_task_instance_fields.task_id AS task_id,
        rendered_task_instance_fields.execution_date AS execution_date,
        rendered_task_instance_fields.rendered_fields AS rendered_fields,
        rendered_task_instance_fields.k8s_pod_yaml AS k8s_pod_yaml
    FROM rendered_task_instance_fields
    WHERE
        NOT (exists(SELECT
                        1
                    FROM task_instance
                    JOIN dag_run
                        ON dag_run.run_id = task_instance.run_id
                                AND dag_run.dag_id = task_instance.dag_id
                    WHERE
                        rendered_task_instance_fields.dag_id = 
task_instance.dag_id
                        AND rendered_task_instance_fields.task_id = 
task_instance.task_id
                        AND rendered_task_instance_fields.execution_date = 
dag_run.execution_date
            ))
     ) AS anon_1
;
```

After:

```sql
CREATE TABLE _airflow_moved__2_3__dangling__rendered_task_instance_fields AS
SELECT
  rendered_task_instance_fields.dag_id AS dag_id,
  rendered_task_instance_fields.task_id AS task_id,
  rendered_task_instance_fields.execution_date AS execution_date,
  rendered_task_instance_fields.rendered_fields AS rendered_fields,
  rendered_task_instance_fields.k8s_pod_yaml AS k8s_pod_yaml +
FROM
  rendered_task_instance_fields
  LEFT OUTER JOIN dag_run ON rendered_task_instance_fields.dag_id = 
dag_run.dag_id
  AND rendered_task_instance_fields.execution_date = dag_run.execution_date
  LEFT OUTER JOIN task_instance ON dag_run.dag_id = task_instance.dag_id
  AND dag_run.run_id = task_instance.run_id
  AND rendered_task_instance_fields.task_id = task_instance.task_id
WHERE
  task_instance.dag_id IS NULL
  OR dag_run.dag_id IS NULL
;
```

Report URL: https://github.com/apache/airflow/actions/runs/2282027476

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to