shorrocka opened a new issue, #51448: URL: https://github.com/apache/airflow/issues/51448
### Apache Airflow version Other Airflow 2 version (please specify below) ### If "Other Airflow 2 version" selected, which one? 2.10.5 ### What happened? Periodically, about every other day our airflow scheduler will crash after a dag run with the following error: `Process ForkProcess-35: Traceback (most recent call last): File "/usr/lib64/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib64/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/data/apache-airflow/lib64/python3.12/site-packages/airflow/dag_processing/manager.py", line 247, in _run_processor_manager processor_manager.start() File "/data/apache-airflow/lib64/python3.12/site-packages/airflow/dag_processing/manager.py", line 489, in start return self._run_parsing_loop() ^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/apache-airflow/lib64/python3.12/site-packages/airflow/dag_processing/manager.py", line 616, in _run_parsing_loop self._collect_results_from_processor(processor) File "/data/apache-airflow/lib64/python3.12/site-packages/airflow/dag_processing/manager.py", line 1143, in _collect_results_from_processor if processor.result is not None: ^^^^^^^^^^^^^^^^ File "/data/apache-airflow/lib64/python3.12/site-packages/airflow/dag_processing/processor.py", line 379, in result raise AirflowException("Tried to get the result before it's done!") airflow.exceptions.AirflowException: Tried to get the result before it's done!` This happens directly after seemingly normal scheduler events and a dag finishes ie. here are the preceding log events: `[2025-06-05T10:01:04.202-0400] {scheduler_job_runner.py:813} INFO - TaskInstance Finished: dag_id=restore_study_schema, task_id=create_schema, run_id=manual__2025-06-05T14:00:39.494709+00:00, map_index=-1, run_start_date=2025-06-05 14:01:03.157949+00:00, run_end_date=2025-06-05 14:01:03.614029+00:00, run_duration=0.45608, state=up_for_retry, executor=LocalExecutor(parallelism=32), executor_state=success, try_number=3, max_tries=5, job_id=924122, pool=default_pool, queue=default, priority_weight=4, operator=MySQLExecuteQueryOperator, queued_dttm=2025-06-05 14:01:02.534776+00:00, queued_by_job_id=922565, pid=558903 [2025-06-05T10:01:04.202-0400] {scheduler_job_runner.py:813} INFO - TaskInstance Finished: dag_id=restore_study_schema, task_id=create_schema, run_id=manual__2025-06-05T14:00:39.495018+00:00, map_index=-1, run_start_date=2025-06-05 14:01:03.157949+00:00, run_end_date=2025-06-05 14:01:03.613053+00:00, run_duration=0.455104, state=up_for_retry, executor=LocalExecutor(parallelism=32), executor_state=success, try_number=3, max_tries=5, job_id=924124, pool=default_pool, queue=default, priority_weight=4, operator=MySQLExecuteQueryOperator, queued_dttm=2025-06-05 14:01:02.534776+00:00, queued_by_job_id=922565, pid=558902` Then we get a whole bunch of the following: `[2025-06-05T10:01:16.606-0400] {scheduler_job_runner.py:922} ERROR - Executor LocalExecutor(parallelism=32) reported that the task instance <TaskInstance: restore_study_schema.create_schema manual__2025-06-05T14:00:39.493053+00:00 [queued]> finished with state failed, but the task instance's state attribute is queued. Learn more: https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#task-state-changed-externally [2025-06-05T10:01:16.621-0400] {taskinstance.py:3315} ERROR - Executor LocalExecutor(parallelism=32) reported that the task instance <TaskInstance: restore_study_schema.create_schema manual__2025-06-05T14:00:39.493053+00:00 [queued]> finished with state failed, but the task instance's state attribute is queued. Learn more: https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#task-state-changed-externally` This is then followed by repeating instances of the dagfileprocessormanager exiting restarting in a loop: `[2025-06-05T10:02:06.106-0400] {manager.py:280} WARNING - DagFileProcessorManager (PID=560642) exited with exit code -11 - re-launching [2025-06-05T10:02:06.110-0400] {manager.py:174} INFO - Launched DagFileProcessorManager with pid: 560675 [2025-06-05T10:02:06.118-0400] {settings.py:63} INFO - Configured default timezone UTC [2025-06-05T10:02:09.262-0400] {manager.py:280} WARNING - DagFileProcessorManager (PID=560675) exited with exit code -11 - re-launching [2025-06-05T10:02:09.266-0400] {manager.py:174} INFO - Launched DagFileProcessorManager with pid: 560715` Before we finally get a Segmentation fault (core dumped) I've tried to inspect the core but it's always cut off and doesn't give me any kind of actually useful information. ### What you think should happen instead? _No response_ ### How to reproduce We install airflow from py-pi using python 3.12 in a virtual environment. The scheduler and web-server have both been running in a tmux session or using nohup, either way the crash occurs. ### Operating System Red Hat Enterprise Linux 9.5 (Plow) ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==9.8.0 apache-airflow-providers-apache-spark==5.0.0 apache-airflow-providers-celery==3.10.0 apache-airflow-providers-common-compat==1.7.0 apache-airflow-providers-common-io==1.5.0 apache-airflow-providers-common-sql==1.27.1 apache-airflow-providers-docker==4.2.0 apache-airflow-providers-elasticsearch==6.2.0 apache-airflow-providers-fab==1.5.3 apache-airflow-providers-ftp==3.12.2 apache-airflow-providers-http==5.2.0 apache-airflow-providers-imap==3.8.2 apache-airflow-providers-postgres==6.1.0 apache-airflow-providers-sftp==5.1.0 apache-airflow-providers-smtp==2.0.0 apache-airflow-providers-snowflake==6.1.0 apache-airflow-providers-sqlite==4.0.0 apache-airflow-providers-ssh==4.0.0 apache-airflow-providers-trino==6.0.1 ### Deployment Virtualenv installation ### Deployment details Python 3.12.5 ### Anything else? This occurs almost every other day when we are running with only a few dags. I have tried a whole manner of changing configuration settings increasing timeouts for zombie jobs, the timeout for dag parsing, increased postgres connections. I am really not sure what is the underlying cause and couldn't find another instance of this issue. Any help or guidance would be hugely appreciated. ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org