jarno-r opened a new issue, #65237:
URL: https://github.com/apache/airflow/issues/65237

   ### Under which category would you file this issue?
   
   Airflow Core
   
   ### Apache Airflow version
   
   3.2.0
   
   ### What happened and how to reproduce it?
   
   I installed Airflow with the following command into a Python 3.12 venv:
   ````
   uv pip install apache-airflow==3.2.0 apache-airflow-providers-microsoft-azure
   ````
   Then copied *bug-dag-debug.py* (below) into *~/airflow/dags* folder. 
Launched `airflow standalone` and triggered the dag.
   Within 30 minutes, the web UI becomes unresponsive and the 
`http://localhost:8080/api/v2/monitor/health` endpoint stops responding. 
   The output shows this:
   ````
   ...
   pi-server | | ExceptionGroup: unhandled errors in a TaskGroup (1 
sub-exception)
   api-server | +-+---------------- 1 ----------------
   api-server | | Traceback (most recent call last):
   api-server | |   File 
"/home/azureuser/venv2/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py",
 line 420, in run_asgi
   api-server | |     result = await app(  # type: ignore[func-returns-value]
   ...
   api-server | |     rec = pool._do_get()
   api-server | |           ^^^^^^^^^^^^^^
   api-server | |   File 
"/home/azureuser/venv2/lib/python3.12/site-packages/sqlalchemy/pool/impl.py", 
lin
   e 166, in _do_get
   api-server | |     raise exc.TimeoutError(
   api-server | | sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 
overflow 10 reached, connection t
   imed out, timeout 30.00 (Background on this error at: 
https://sqlalche.me/e/20/3o7r)
   ````
   It seems that setting the dag `schedule` to something other than `None` is a 
part of the cause, since the issue doesn't seem to happen without that.
   
   I haven't been able to reproduce the issue with a simpler DAG than this 
*bug-dag-debug.py*:
   ````python
   import logging
   import time
   from datetime import datetime,timedelta,timezone
   
   from airflow.sdk import dag, task, task_group, Variable
   
   @task(task_id="initialize")
   def initialize_task():
       logging.info("Initialize")
       nodes=[]
       for i in range(5):
           logging.info("Sleepy")
           time.sleep(10)
       return (set(),set(),nodes,set())
       
   @task()
   def do_stuff(state, dummy):
       for i in range(10):
           logging.info("Sleepy")
           time.sleep(10)
       return state
   
   @task_group
   def incremental(state,since:datetime):
       return do_stuff(state,[])
   
   @dag(
       start_date=datetime(2026, 1, 1, 21, tzinfo=timezone.utc),
       schedule="0 21 * * *",
       catchup = False, 
       default_args={
       }
   )
   def bug_dag_debug():
       @task()
       def all_done():
           logging.info("All complete!")
   
       @task
       def is_ready(state):
           logging.info("Get back to work!")
           return [state]
   
       parse_task=initialize_task()
       
       done_task=all_done()
   
       t = parse_task
       for _ in range(5):
           t=incremental.partial(since=datetime.now()).expand(state=is_ready(t))
       
       t >> done_task
   
   hello_dag = bug_dag_debug()
   ````
   
   ### What you think should happen instead?
   
   In Airflow 3.1.8 this runs fine. It should work in Airflow 3.2.0 as well.
   
   ### Operating System
   
   Ubuntu 24.04.4 LTS
   
   ### Deployment
   
   None
   
   ### Apache Airflow Provider(s)
   
   _No response_
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Official Helm Chart version
   
   Not Applicable
   
   ### Kubernetes Version
   
   _No response_
   
   ### Helm Chart configuration
   
   _No response_
   
   ### Docker Image customizations
   
   _No response_
   
   ### Anything else?
   
   It's not clear to me whether just the api-server fails or if the scheduler 
also stops running. The above simplified dag might run to the end, but the 
original one seemed to stop running early on.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to