1fanwang opened a new pull request, #66773: URL: https://github.com/apache/airflow/pull/66773
Closes #58307. `_find_task_instances_without_heartbeats` filters with `TI.last_heartbeat_at < limit_dttm`. In SQL three-valued logic, that predicate evaluates to `NULL` (not `TRUE`) when the row's `last_heartbeat_at IS NULL`, so the row is never returned and the TI never gets purged. `last_heartbeat_at IS NULL` is a real state — every TI has it briefly between QUEUED→RUNNING and the first heartbeat from the worker. If a worker crashes inside that window (OOM kill, K8s eviction during pod start, network blip during init), the TI stays RUNNING forever. The scheduler already knows about this gap: `adopt_or_reset_orphaned_tasks` falls back to `utcnow()` on the migration path when `last_heartbeat_at IS NULL` (`scheduler_job_runner.py:2855`), but the heartbeat-cleanup path doesn't have a matching fallback. This PR extends the predicate to use `start_date` when `last_heartbeat_at IS NULL`. A TI that started long enough ago to be past the heartbeat-timeout, and has still never reported a heartbeat, is the exact stuck-forever case the cleanup is meant to handle. ## Tests Two new cases in `tests/unit/jobs/test_scheduler_job.py`: - `test_find_and_purge_task_instances_without_heartbeats_null_last_heartbeat` — NULL `last_heartbeat_at` with an old `start_date` is now caught by the query and purged. Fails on `main`, passes with this PR. - `test_find_and_purge_task_instances_without_heartbeats_null_last_heartbeat_fresh_start` — NULL `last_heartbeat_at` with a fresh `start_date` (inside the timeout window) is still left alone. Guards against killing newly-started tasks that haven't had a chance to report their first heartbeat yet. The other 11 heartbeat-related tests in the file continue to pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
