The GitHub Actions job "Tests" on airflow.git has succeeded. Run started by GitHub user ashb (triggered by ashb).
Head commit for run: 3115a4f6b8d48adac0285f33c84474c5612a7753 / Ash Berlin-Taylor <[email protected]> Change approach to finding bad rows to LEFT OUTER JOIN. Rather than two sub-selects, For a _large_ database (27m TaskInstances, 2m DagRuns) this takes the time from 10minutes to around 3 minutes per table (we have 3) down to 3 minutes per table. Before: ```sql SELECT count(*) AS count_1 FROM ( SELECT rendered_task_instance_fields.dag_id AS dag_id, rendered_task_instance_fields.task_id AS task_id, rendered_task_instance_fields.execution_date AS execution_date, rendered_task_instance_fields.rendered_fields AS rendered_fields, rendered_task_instance_fields.k8s_pod_yaml AS k8s_pod_yaml FROM rendered_task_instance_fields WHERE NOT (exists(SELECT 1 FROM task_instance JOIN dag_run ON dag_run.run_id = task_instance.run_id AND dag_run.dag_id = task_instance.dag_id WHERE rendered_task_instance_fields.dag_id = task_instance.dag_id AND rendered_task_instance_fields.task_id = task_instance.task_id AND rendered_task_instance_fields.execution_date = dag_run.execution_date )) ) AS anon_1 ; ``` After: ```sql CREATE TABLE _airflow_moved__2_3__dangling__rendered_task_instance_fields AS SELECT rendered_task_instance_fields.dag_id AS dag_id, rendered_task_instance_fields.task_id AS task_id, rendered_task_instance_fields.execution_date AS execution_date, rendered_task_instance_fields.rendered_fields AS rendered_fields, rendered_task_instance_fields.k8s_pod_yaml AS k8s_pod_yaml + FROM rendered_task_instance_fields LEFT OUTER JOIN dag_run ON rendered_task_instance_fields.dag_id = dag_run.dag_id AND rendered_task_instance_fields.execution_date = dag_run.execution_date LEFT OUTER JOIN task_instance ON dag_run.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id AND rendered_task_instance_fields.task_id = task_instance.task_id WHERE task_instance.dag_id IS NULL OR dag_run.dag_id IS NULL ; ``` Report URL: https://github.com/apache/airflow/actions/runs/2282027476 With regards, GitHub Actions via GitBox --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
