wjddn279 opened a new issue, #56879: URL: https://github.com/apache/airflow/issues/56879
### Apache Airflow version 3.1.0 ### If "Other Airflow 2/3 version" selected, which one? _No response_ ### What happened? I deployed Airflow in a Kubernetes environment and observed that the dag-processor was restarting irregularly. Upon checking the error logs, I found the following issues: (I attach full log files: [full_log.txt](https://github.com/user-attachments/files/22513572/full_log.txt)) ``` File "/home/airflow/.local/lib/python3.11/site-packages/MySQLdb/connections.py", line 280, in query _mysql.connection.query(self, query) sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (2013, 'Lost connection to server during query') [SQL: SELECT dag_priority_parsing_request.id, dag_priority_parsing_request.bundle_name, dag_priority_parsing_request.relative_fileloc FROM dag_priority_parsing_request WHERE dag_priority_parsing_request.bundle_name IN (%s)] [parameters: ('dags-folder',)] (Background on this error at: https://sqlalche.me/e/14/e3q8) [2025-09-24T17:01:10.882+0900] {settings.py:494} DEBUG - Disposing DB connection pool (PID 7) ``` ### What you think should happen instead? the `_get_priority_files()` function executes a query, during which the MySQL connection is unexpectedly closed. This raises an exception and causes the dag-processor to exit. I also identified other exceptions. While those do not lead to termination (since they are covered by retry logic and try-catch blocks), they appear to be caused by the same underlying issue: sudden termination of MySQL connections. By reviewing the queries arriving at MySQL during the error times, I confirmed that the connection was indeed being closed with a Quit command. <img width="2124" height="276" alt="Image" src="https://github.com/user-attachments/assets/21d42381-0ca0-4659-9996-ead07be49df4" /> ### How to reproduce - deploy airflow 3+ - backend: mysql ### Operating System k8s ### Versions of Apache Airflow Providers _No response_ ### Deployment Official Apache Airflow Helm Chart ### Deployment details - k8s deployment - mysql 8.0+ version - official helm chart ### Anything else? ### Why is the Quit signal being sent? In conclusion, the root cause is the recreation of the existing engine pool object in the subprocess during a `fork`, as performed in [airflow/settings.py#L426-L436](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/settings.py#L426-L436). 1. The subprocess recreates the pool of the engine when it starts. 2. The prior pool connection objects copied from the parent process lose their references. 3. They are garbage-collected, which closes the connections originally established by the parent. 4. The parent process, unaware that its pool connections have been closed, attempts a query and encounters an error. To verify this, I added the following code to observe when the initially established connections from the parent process are garbage-collected: ``` @event.listens_for(engine, "connect") def set_mysql_timezone(dbapi_connection, connection_record) log.debug(f"[connect] New DB connection established, id={os.getpid()}") weakref.finalize(dbapi_connection, lambda: print(f"{datetime.now().isoformat()} dbapi_connection finalized via weakref in os {os.getpid()}", )) weakref.finalize(connection_record, lambda: print(f"{datetime.now().isoformat()} connection_record finalized via weakref in os {os.getpid()}")) ``` The following logs were observed. The timestamp of the logs exactly matched the time when the `Quit` command appeared in the MySQL query log. The fact that they occurred in PID 417 indicates that the copied pool from the parent process was garbage-collected in the child process: ``` 2025-09-22T13:41:30.352393 connection_record finalized via weakref in os 417 2025-09-22T13:41:30.352403 dbapi_connection finalized via weakref in os 417 ``` ### Why does this issue occur only with MySQL and not with PostgreSQL? Based on testing, both engines trigger garbage collection in the subprocess under the current code. However, in MySQL’s case, the driver explicitly sends a COM_QUIT command when a connection object is garbage-collected, as shown in [mysqlclient/_mysql.c#L2233-L2243](https://github.com/PyMySQL/mysqlclient/blob/main/src/MySQLdb/_mysql.c#L2233-L2243) In contrast, PostgreSQL does not appear to close the connection itself—it seems to only close the socket without sending a termination command. (This is the behavior we want.) Additionally, if the garbage collection threshold is not reached (such as in small-scale DAG parsing scenarios), garbage collection does not occur — and in such cases, the issue does not happen even with MySQL. ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
