The GitHub Actions job "Tests" on airflow.git/fix_datasets_for_2_11_2_pr has failed. Run started by GitHub user joseotaviorf (triggered by potiuk).
Head commit for run: 9a838bf560ef81cb56cbd5c75b00f7f12253399a / José Otávio R. Ferreira <[email protected]> Fix dataset-triggered DAGs firing without all upstream events (backport of #62501) Backports three targeted fixes to 2.10.4 to address the race condition reported in GH#56541 and GH#41101, where a dataset-dependent DAG fires with one or more upstream dataset events missing from consumed_dataset_events. Root cause: stale `created_at` timestamps in DatasetDagRunQueue caused the scheduler's event-window query to exclude valid events, yet the presence-based readiness check still considered the DAG ready to run. Changes: - manager.py: replace ON CONFLICT DO NOTHING with ON CONFLICT DO UPDATE (Postgres) and add a WHERE guard to prevent backwards timestamp drift; pass explicit created_at=utcnow() on the non-Postgres merge path so existing rows are always refreshed. - scheduler_job_runner.py: wrap create_dagrun in `if dataset_events:` to prevent phantom runs when the event window is empty; scope the DDRQ DELETE to `created_at <= exec_date` instead of deleting all rows, so events that arrived during processing are preserved for the next cycle. - tests/datasets/test_manager.py: add WHERE-guard regression test for the Postgres upsert path. - dev/BUG_REPORT_DATASET_SCHEDULING.md: detailed bug report with timeline diagrams, related issues (#56541, #41101, #35870), root cause analysis, and explanation of each fix. - reproduce_bug_56541.py: end-to-end reproduction and verification script. Report URL: https://github.com/apache/airflow/actions/runs/23305911454 With regards, GitHub Actions via GitBox --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
