gudata opened a new issue, #46930:
URL: https://github.com/apache/airflow/issues/46930

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   v2.9.1
   
   ### What happened?
   
   I have 20 ec2 worker instances. On each ec2 in a docker I run worker, and 
the trigger  containers.
   
   I am starting 100 dags at the same time. - at 5:00utc.
   
   Very often I see some tasks to be scheduled and then killed/removed/do not 
started.
   
   Something is failing but i can't see why. At the UI there is a attempt to 
run _execute_in_fork which fails/timeouts.
   
   I checked on the workers - there is enough memory and cpu, disk space, file 
descriptors, no limit on the processes, no max forks are reached.
   
   Any ideas what might be?
   
   On the server/scheduler:
   ```
   INFO - Setting external_id for <TaskInstance: 
CLIENTNAME_dag.confirm_ready_for_transformations 
scheduled__2025-02-20T05:00:00+00:00 [queued]> to 
b55db308-c50b-465b-a91e-789676b4b450
   2025-02-20T05:02:40.882885216Z [2025-02-20T05:02:40.882+0000] 
{task_context_logger.py:91} ERROR - Executor reports task instance 
<TaskInstance: CLIENTNAME_dag.confirm_ready_for_transformations 
scheduled__2025-02-20T05:00:00+00:00 [queued]> finished (failed) although the 
task says it's queued. (Info: None) Was the task killed externally?
   
   ```
   on the worker in the logs of worker process (in a docker container)
   ```
   2025-02-20T05:02:36.620231955Z [2025-02-20 05:02:36,618: 
ERROR/ForkPoolWorker-16] Process timed out, PID: 1576563
   2025-02-20T05:02:36.688952182Z [2025-02-20 05:02:36,687: 
ERROR/ForkPoolWorker-16] Task 
airflow.providers.celery.executors.celery_executor_utils.execute_command[b55db308-c50b-465b-a91e-789676b4b450]
 raised unexpected: AirflowException('Celery command failed on host: 
54c1553f81f2 with celery_task_id b55db308-c50b-465b-a91e-789676b4b450')
   2025-02-20T05:02:36.688999873Z Traceback (most recent call last):
   2025-02-20T05:02:36.689007554Z   File 
"/home/airflow/.local/lib/python3.12/site-packages/celery/app/trace.py", line 
453, in trace_task
   2025-02-20T05:02:36.689014304Z     R = retval = fun(*args, **kwargs)
   2025-02-20T05:02:36.689020164Z                  ^^^^^^^^^^^^^^^^^^^^
   2025-02-20T05:02:36.689025754Z   File 
"/home/airflow/.local/lib/python3.12/site-packages/celery/app/trace.py", line 
736, in __protected_call__
   2025-02-20T05:02:36.689031574Z     return self.run(*args, **kwargs)
   2025-02-20T05:02:36.689036854Z            ^^^^^^^^^^^^^^^^^^^^^^^^^
   2025-02-20T05:02:36.689042145Z   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py",
 line 136, in execute_command
   2025-02-20T05:02:36.689048015Z     _execute_in_fork(command_to_exec, 
celery_task_id)
   2025-02-20T05:02:36.689053345Z   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py",
 line 151, in _execute_in_fork
   2025-02-20T05:02:36.689059185Z     raise AirflowException(msg)
   2025-02-20T05:02:36.689064465Z airflow.exceptions.AirflowException: Celery 
command failed on host: 54c1553f81f2 with celery_task_id 
b55db308-c50b-465b-a91e-789676b4b450
   2025-02-20T05:02:43.076153425Z [2025-02-20 05:02:43,075: 
ERROR/ForkPoolWorker-15] Process timed out, PID: 1576565
   ```
   
   And in celery in the UI
   ```
   AirflowException('Celery command failed on host: 54c1553f81f2 with 
celery_task_id b55db308-c50b-465b-a91e-789676b4b450')
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.12/site-packages/celery/app/trace.py", line 
453, in trace_task
       R = retval = fun(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/celery/app/trace.py", line 
736, in __protected_call__
       return self.run(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py",
 line 136, in execute_command
       _execute_in_fork(command_to_exec, celery_task_id)
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py",
 line 151, in _execute_in_fork
       raise AirflowException(msg)
   airflow.exceptions.AirflowException: Celery command failed on host: 
54c1553f81f2 with celery_task_id b55db308-c50b-465b-a91e-789676b4b450
   ```
   
   ### What you think should happen instead?
   
   _No response_
   
   ### How to reproduce
   
   No Idea. It hapens randomly. I tried increased the workers to see if there 
some limit hitting but still I have the rror.
   
   ### Operating System
   
   linux
   
   ### Versions of Apache Airflow Providers
   
   the docker version
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   a master node on a ec2 with a lot of workers on a ec2.
   airflow on the master is running in a container.
   workers run airflow also in containers.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to