The GitHub Actions job "Tests" on airflow.git has failed.
Run started by GitHub user ashb (triggered by ashb).

Head commit for run:
22a9293ff8f48411d39074d9bc88af35abe9850f / Ash Berlin-Taylor <[email protected]>
Change approach to finding bad rows to LEFT OUTER JOIN. (#23528)

Rather than sub-selects (two for count, or one for the CREATE TABLE).

For a _large_ database (27m TaskInstances, 2m DagRuns) this takes the
time from 10minutes to around 3 minutes per table (we have 3) down to 3
minutes per table. (All times on Postgres.)

Before:

```sql
CREATE TABLE _airflow_moved__2_3__dangling__rendered_task_instance_fields AS
SELECT
  rendered_task_instance_fields.dag_id AS dag_id,
  rendered_task_instance_fields.task_id AS task_id,
  rendered_task_instance_fields.execution_date AS execution_date,
  rendered_task_instance_fields.rendered_fields AS rendered_fields,
  rendered_task_instance_fields.k8s_pod_yaml AS k8s_pod_yaml +
FROM
  rendered_task_instance_fields
WHERE
  NOT (
    EXISTS (
      SELECT
        1
      FROM
        task_instance
        JOIN dag_run ON dag_run.dag_id = task_instance.dag_id
        AND dag_run.run_id = task_instance.run_id
      WHERE
        rendered_task_instance_fields.dag_id = task_instance.dag_id
        AND rendered_task_instance_fields.task_id = task_instance.task_id
        AND rendered_task_instance_fields.execution_date = 
dag_run.execution_date
    )
  )
```

After:

```sql
CREATE TABLE _airflow_moved__2_3__dangling__rendered_task_instance_fields AS
SELECT
  rendered_task_instance_fields.dag_id AS dag_id,
  rendered_task_instance_fields.task_id AS task_id,
  rendered_task_instance_fields.execution_date AS execution_date,
  rendered_task_instance_fields.rendered_fields AS rendered_fields,
  rendered_task_instance_fields.k8s_pod_yaml AS k8s_pod_yaml +
FROM
  rendered_task_instance_fields
  LEFT OUTER JOIN dag_run ON rendered_task_instance_fields.dag_id = 
dag_run.dag_id
  AND rendered_task_instance_fields.execution_date = dag_run.execution_date
  LEFT OUTER JOIN task_instance ON dag_run.dag_id = task_instance.dag_id
  AND dag_run.run_id = task_instance.run_id
  AND rendered_task_instance_fields.task_id = task_instance.task_id
WHERE
  task_instance.dag_id IS NULL
  OR dag_run.dag_id IS NULL
;
```

Report URL: https://github.com/apache/airflow/actions/runs/2282777651

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to