The GitHub Actions job "Tests" on airflow.git/fix_datasets_for_2_11_2_pr has 
failed.
Run started by GitHub user joseotaviorf (triggered by potiuk).

Head commit for run:
9a838bf560ef81cb56cbd5c75b00f7f12253399a / José Otávio R. Ferreira 
<[email protected]>
Fix dataset-triggered DAGs firing without all upstream events (backport of 
#62501)

Backports three targeted fixes to 2.10.4 to address the race condition reported
in GH#56541 and GH#41101, where a dataset-dependent DAG fires with one or more
upstream dataset events missing from consumed_dataset_events.

Root cause: stale `created_at` timestamps in DatasetDagRunQueue caused the
scheduler's event-window query to exclude valid events, yet the presence-based
readiness check still considered the DAG ready to run.

Changes:
- manager.py: replace ON CONFLICT DO NOTHING with ON CONFLICT DO UPDATE
  (Postgres) and add a WHERE guard to prevent backwards timestamp drift;
  pass explicit created_at=utcnow() on the non-Postgres merge path so
  existing rows are always refreshed.
- scheduler_job_runner.py: wrap create_dagrun in `if dataset_events:` to
  prevent phantom runs when the event window is empty; scope the DDRQ
  DELETE to `created_at <= exec_date` instead of deleting all rows, so
  events that arrived during processing are preserved for the next cycle.
- tests/datasets/test_manager.py: add WHERE-guard regression test for the
  Postgres upsert path.
- dev/BUG_REPORT_DATASET_SCHEDULING.md: detailed bug report with timeline
  diagrams, related issues (#56541, #41101, #35870), root cause analysis,
  and explanation of each fix.
- reproduce_bug_56541.py: end-to-end reproduction and verification script.

Report URL: https://github.com/apache/airflow/actions/runs/23305911454

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to