gudata opened a new issue, #46930: URL: https://github.com/apache/airflow/issues/46930
### Apache Airflow version Other Airflow 2 version (please specify below) ### If "Other Airflow 2 version" selected, which one? v2.9.1 ### What happened? I have 20 ec2 worker instances. On each ec2 in a docker I run worker, and the trigger containers. I am starting 100 dags at the same time. - at 5:00utc. Very often I see some tasks to be scheduled and then killed/removed/do not started. Something is failing but i can't see why. At the UI there is a attempt to run _execute_in_fork which fails/timeouts. I checked on the workers - there is enough memory and cpu, disk space, file descriptors, no limit on the processes, no max forks are reached. Any ideas what might be? On the server/scheduler: ``` INFO - Setting external_id for <TaskInstance: CLIENTNAME_dag.confirm_ready_for_transformations scheduled__2025-02-20T05:00:00+00:00 [queued]> to b55db308-c50b-465b-a91e-789676b4b450 2025-02-20T05:02:40.882885216Z [2025-02-20T05:02:40.882+0000] {task_context_logger.py:91} ERROR - Executor reports task instance <TaskInstance: CLIENTNAME_dag.confirm_ready_for_transformations scheduled__2025-02-20T05:00:00+00:00 [queued]> finished (failed) although the task says it's queued. (Info: None) Was the task killed externally? ``` on the worker in the logs of worker process (in a docker container) ``` 2025-02-20T05:02:36.620231955Z [2025-02-20 05:02:36,618: ERROR/ForkPoolWorker-16] Process timed out, PID: 1576563 2025-02-20T05:02:36.688952182Z [2025-02-20 05:02:36,687: ERROR/ForkPoolWorker-16] Task airflow.providers.celery.executors.celery_executor_utils.execute_command[b55db308-c50b-465b-a91e-789676b4b450] raised unexpected: AirflowException('Celery command failed on host: 54c1553f81f2 with celery_task_id b55db308-c50b-465b-a91e-789676b4b450') 2025-02-20T05:02:36.688999873Z Traceback (most recent call last): 2025-02-20T05:02:36.689007554Z File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/trace.py", line 453, in trace_task 2025-02-20T05:02:36.689014304Z R = retval = fun(*args, **kwargs) 2025-02-20T05:02:36.689020164Z ^^^^^^^^^^^^^^^^^^^^ 2025-02-20T05:02:36.689025754Z File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/trace.py", line 736, in __protected_call__ 2025-02-20T05:02:36.689031574Z return self.run(*args, **kwargs) 2025-02-20T05:02:36.689036854Z ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-02-20T05:02:36.689042145Z File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py", line 136, in execute_command 2025-02-20T05:02:36.689048015Z _execute_in_fork(command_to_exec, celery_task_id) 2025-02-20T05:02:36.689053345Z File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py", line 151, in _execute_in_fork 2025-02-20T05:02:36.689059185Z raise AirflowException(msg) 2025-02-20T05:02:36.689064465Z airflow.exceptions.AirflowException: Celery command failed on host: 54c1553f81f2 with celery_task_id b55db308-c50b-465b-a91e-789676b4b450 2025-02-20T05:02:43.076153425Z [2025-02-20 05:02:43,075: ERROR/ForkPoolWorker-15] Process timed out, PID: 1576565 ``` And in celery in the UI ``` AirflowException('Celery command failed on host: 54c1553f81f2 with celery_task_id b55db308-c50b-465b-a91e-789676b4b450') Traceback (most recent call last): File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/trace.py", line 453, in trace_task R = retval = fun(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/trace.py", line 736, in __protected_call__ return self.run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py", line 136, in execute_command _execute_in_fork(command_to_exec, celery_task_id) File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py", line 151, in _execute_in_fork raise AirflowException(msg) airflow.exceptions.AirflowException: Celery command failed on host: 54c1553f81f2 with celery_task_id b55db308-c50b-465b-a91e-789676b4b450 ``` ### What you think should happen instead? _No response_ ### How to reproduce No Idea. It hapens randomly. I tried increased the workers to see if there some limit hitting but still I have the rror. ### Operating System linux ### Versions of Apache Airflow Providers the docker version ### Deployment Other Docker-based deployment ### Deployment details a master node on a ec2 with a lot of workers on a ec2. airflow on the master is running in a container. workers run airflow also in containers. ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org