eanikindfi opened a new issue, #49498: URL: https://github.com/apache/airflow/issues/49498
### Apache Airflow version Other Airflow 2 version (please specify below) ### If "Other Airflow 2 version" selected, which one? 2.10.2-python3.12-651c10a8 ### What happened? The situation: - One of our task ended in `skipped` state; - The rest of the tasks had no states after the previous one (see screenshot); - The DAG had `failed` state; - The DAG has not been started after `schedule_interval` - I think because on previous run we had some tasks with no states. <img width="487" alt="Image" src="https://github.com/user-attachments/assets/5bccb150-a958-41fc-ae59-c2356ab9cd19" /> Our workers didn't die after OOM kills. Log from UI: ``` airflow-production-worker-1.airflow-production-worker.airflow-production.svc.cluster.local *** Found logs in s3: *** * s3://my-bucket/dag_id=in_main/run_id=scheduled__2025-04-18T18:30:00+00:00/task_id=in_rir_proc/attempt=1.log [2025-04-20, 02:57:39 UTC] {local_task_job_runner.py:123} ▶ Pre task execution logs [2025-04-20, 02:57:39 UTC] {sql.py:266} INFO - Executing: exec ... [2025-04-20, 02:57:39 UTC] {base.py:84} INFO - Retrieving connection 'IN_analytics_Airflow' [2025-04-20, 02:57:39 UTC] {base.py:84} INFO - Retrieving connection 'IN_analytics_Airflow' [2025-04-20, 02:57:40 UTC] {sql.py:509} INFO - Running statement: exec ..., parameters: None [2025-04-20, 03:13:24 UTC] {local_task_job_runner.py:346} WARNING - State of this instance has been externally set to restarting. Terminating instance. [2025-04-20, 03:13:24 UTC] {local_task_job_runner.py:245} ▲▲▲ Log group end [2025-04-20, 03:13:24 UTC] {process_utils.py:132} INFO - Sending 15 to group 668854. PIDs of all processes in the group: [668854] [2025-04-20, 03:13:24 UTC] {process_utils.py:87} INFO - Sending the signal 15 to group 668854 [2025-04-20, 03:14:24 UTC] {process_utils.py:150} WARNING - process psutil.Process(pid=668854, name='airflow task', status='sleeping', started='02:57:39') did not respond to SIGTERM. Trying SIGKILL [2025-04-20, 03:14:24 UTC] {process_utils.py:87} INFO - Sending the signal 9 to group 668854 [2025-04-20, 03:14:24 UTC] {process_utils.py:80} INFO - Process psutil.Process(pid=668854, name='airflow task', status='terminated', exitcode=<Negsignal.SIGKILL: -9>, started='02:57:39') (668854) terminated with exit code -9 [2025-04-20, 03:14:24 UTC] {standard_task_runner.py:190} ERROR - ('Job 2382942 was killed before it finished (likely due to running out of memory)', 'For more information, see https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#LocalTaskJob-killed') ``` I have edited some questionable but not important fields in log. ### What you think should happen instead? The DAG should continue running and 1 skipped task should not block the rest of the DAG. I think in similar scenarios (when task has a `skipped` state) other child tasks receive `skipped` state as well. Also we have parameter `dagrun_timeout=timedelta(hours=23)` And problematic DAG was running full 23 hours. Maximum duration for this DAG in regular cases can take like 8-10 hours. ### How to reproduce We don't know exactly. You should have a DAG with multiple tasks. ### Operating System v1.32.2-eks-bc803b4 ### Versions of Apache Airflow Providers _No response_ ### Deployment Official Apache Airflow Helm Chart ### Deployment details We use celery executors for Airflow workers. ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org