jarno-r opened a new issue, #65237: URL: https://github.com/apache/airflow/issues/65237
### Under which category would you file this issue? Airflow Core ### Apache Airflow version 3.2.0 ### What happened and how to reproduce it? I installed Airflow with the following command into a Python 3.12 venv: ```` uv pip install apache-airflow==3.2.0 apache-airflow-providers-microsoft-azure ```` Then copied *bug-dag-debug.py* (below) into *~/airflow/dags* folder. Launched `airflow standalone` and triggered the dag. Within 30 minutes, the web UI becomes unresponsive and the `http://localhost:8080/api/v2/monitor/health` endpoint stops responding. The output shows this: ```` ... pi-server | | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) api-server | +-+---------------- 1 ---------------- api-server | | Traceback (most recent call last): api-server | | File "/home/azureuser/venv2/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 420, in run_asgi api-server | | result = await app( # type: ignore[func-returns-value] ... api-server | | rec = pool._do_get() api-server | | ^^^^^^^^^^^^^^ api-server | | File "/home/azureuser/venv2/lib/python3.12/site-packages/sqlalchemy/pool/impl.py", lin e 166, in _do_get api-server | | raise exc.TimeoutError( api-server | | sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection t imed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/20/3o7r) ```` It seems that setting the dag `schedule` to something other than `None` is a part of the cause, since the issue doesn't seem to happen without that. I haven't been able to reproduce the issue with a simpler DAG than this *bug-dag-debug.py*: ````python import logging import time from datetime import datetime,timedelta,timezone from airflow.sdk import dag, task, task_group, Variable @task(task_id="initialize") def initialize_task(): logging.info("Initialize") nodes=[] for i in range(5): logging.info("Sleepy") time.sleep(10) return (set(),set(),nodes,set()) @task() def do_stuff(state, dummy): for i in range(10): logging.info("Sleepy") time.sleep(10) return state @task_group def incremental(state,since:datetime): return do_stuff(state,[]) @dag( start_date=datetime(2026, 1, 1, 21, tzinfo=timezone.utc), schedule="0 21 * * *", catchup = False, default_args={ } ) def bug_dag_debug(): @task() def all_done(): logging.info("All complete!") @task def is_ready(state): logging.info("Get back to work!") return [state] parse_task=initialize_task() done_task=all_done() t = parse_task for _ in range(5): t=incremental.partial(since=datetime.now()).expand(state=is_ready(t)) t >> done_task hello_dag = bug_dag_debug() ```` ### What you think should happen instead? In Airflow 3.1.8 this runs fine. It should work in Airflow 3.2.0 as well. ### Operating System Ubuntu 24.04.4 LTS ### Deployment None ### Apache Airflow Provider(s) _No response_ ### Versions of Apache Airflow Providers _No response_ ### Official Helm Chart version Not Applicable ### Kubernetes Version _No response_ ### Helm Chart configuration _No response_ ### Docker Image customizations _No response_ ### Anything else? It's not clear to me whether just the api-server fails or if the scheduler also stops running. The above simplified dag might run to the end, but the original one seemed to stop running early on. ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
