eanikindfi opened a new issue, #49498:
URL: https://github.com/apache/airflow/issues/49498

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   2.10.2-python3.12-651c10a8
   
   ### What happened?
   
   The situation:
   
   - One of our task ended in `skipped` state;
   - The rest of the tasks had no states after the previous one (see 
screenshot);
   - The DAG had `failed` state;
   - The DAG has not been started after `schedule_interval` - I think because 
on previous run we had some tasks with no states.
   
   <img width="487" alt="Image" 
src="https://github.com/user-attachments/assets/5bccb150-a958-41fc-ae59-c2356ab9cd19";
 />
   
   Our workers didn't die after OOM kills.
   
   Log from UI:
   
   ```
   
airflow-production-worker-1.airflow-production-worker.airflow-production.svc.cluster.local
   *** Found logs in s3:
   ***   * 
s3://my-bucket/dag_id=in_main/run_id=scheduled__2025-04-18T18:30:00+00:00/task_id=in_rir_proc/attempt=1.log
   [2025-04-20, 02:57:39 UTC] {local_task_job_runner.py:123} ▶ Pre task 
execution logs
   [2025-04-20, 02:57:39 UTC] {sql.py:266} INFO - Executing: exec ...
   [2025-04-20, 02:57:39 UTC] {base.py:84} INFO - Retrieving connection 
'IN_analytics_Airflow'
   [2025-04-20, 02:57:39 UTC] {base.py:84} INFO - Retrieving connection 
'IN_analytics_Airflow'
   [2025-04-20, 02:57:40 UTC] {sql.py:509} INFO - Running statement: exec ..., 
parameters: None
   [2025-04-20, 03:13:24 UTC] {local_task_job_runner.py:346} WARNING - State of 
this instance has been externally set to restarting. Terminating instance.
   [2025-04-20, 03:13:24 UTC] {local_task_job_runner.py:245} ▲▲▲ Log group end
   [2025-04-20, 03:13:24 UTC] {process_utils.py:132} INFO - Sending 15 to group 
668854. PIDs of all processes in the group: [668854]
   [2025-04-20, 03:13:24 UTC] {process_utils.py:87} INFO - Sending the signal 
15 to group 668854
   [2025-04-20, 03:14:24 UTC] {process_utils.py:150} WARNING - process 
psutil.Process(pid=668854, name='airflow task', status='sleeping', 
started='02:57:39') did not respond to SIGTERM. Trying SIGKILL
   [2025-04-20, 03:14:24 UTC] {process_utils.py:87} INFO - Sending the signal 9 
to group 668854
   [2025-04-20, 03:14:24 UTC] {process_utils.py:80} INFO - Process 
psutil.Process(pid=668854, name='airflow task', status='terminated', 
exitcode=<Negsignal.SIGKILL: -9>, started='02:57:39') (668854) terminated with 
exit code -9
   [2025-04-20, 03:14:24 UTC] {standard_task_runner.py:190} ERROR - ('Job 
2382942 was killed before it finished (likely due to running out of memory)', 
'For more information, see 
https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#LocalTaskJob-killed')
   ```
   I have edited some questionable but not important fields in log.
   
   ### What you think should happen instead?
   
   The DAG should continue running and 1 skipped task should not block the rest 
of the DAG.
   
   I think in similar scenarios (when task has a `skipped` state) other child 
tasks receive `skipped` state as well.
   
   Also we have parameter `dagrun_timeout=timedelta(hours=23)`
   And problematic DAG was running full 23 hours. Maximum duration for this DAG 
in regular cases can take like 8-10 hours.
   
   ### How to reproduce
   
   We don't know exactly. You should have a DAG with multiple tasks.
   
   ### Operating System
   
   v1.32.2-eks-bc803b4
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   We use celery executors for Airflow workers.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to