8silvergun opened a new pull request, #62336:
URL: https://github.com/apache/airflow/pull/62336

   ## Summary
   
   Fix unhandled `MySQLdb.OperationalError: (2006, 'Server has gone away')` in 
`cleanup_session_middleware` when `Session.remove()` encounters a dead database 
connection.
   
   ## Problem
   
   PR #61480 introduced `cleanup_session_middleware` in `providers-fab 3.3.0` 
to fix `PendingRollbackError` (#59349). The middleware calls `Session.remove()` 
in a bare `finally` block:
   
   ```python
   finally:
       from airflow import settings
       if settings.Session:
           settings.Session.remove()  # can raise if DB connection is dead
   ```
   
   When the underlying database connection has been closed server-side (MySQL 
timeout, Aurora failover, network interruption), `Session.remove()` internally 
attempts a `ROLLBACK` on the dead connection, raising `OperationalError`. This 
unhandled exception propagates as a **500 Internal Server Error** — even though 
the original request completed successfully.
   
   **Production error log:**
   
   ```
   [error] Exception in ASGI application 
[airflow.providers.fab.auth_manager.fab_auth_manager] 
loc=fab_auth_manager.py:243
     File ".../fab_auth_manager.py", line 243, in cleanup_session_middleware
       settings.Session.remove()
     ...
   MySQLdb.OperationalError: (2006, 'Server has gone away')
   ```
   
   ## Solution
   
   Wrap `Session.remove()` in a try-except that catches and logs the error as a 
warning:
   
   ```python
   finally:
       from airflow import settings
       if settings.Session:
           try:
               settings.Session.remove()
           except Exception:
               log.warning("Failed to remove session during cleanup", 
exc_info=True)
   ```
   
   This is consistent with session cleanup patterns elsewhere in Airflow (e.g., 
`airflow/utils/session.py`).
   
   ### Why this is safe
   
   - `Session.remove()` is a cleanup operation — if it fails because the 
connection is already dead, the session will be discarded anyway on the next 
request
   - The warning log preserves visibility for debugging
   - The `except Exception` is intentionally broad since any error during 
cleanup should not affect the HTTP response
   
   ## Testing
   
   Added `TestFabAuthManagerSessionCleanupErrorHandling` with 2 tests:
   - `test_session_remove_db_error_does_not_propagate`: Verifies 
`OperationalError` (MySQL 'Server has gone away') is caught
   - `test_session_remove_generic_error_does_not_propagate`: Verifies other 
exceptions (e.g., `RuntimeError`) are also caught
   
   ## Related
   
   - #59349 — Original `PendingRollbackError` issue
   - #61480 — PR that introduced `cleanup_session_middleware` (this PR fixes a 
gap in that implementation)
   - #57470, #57859 — Earlier reports of the session lifecycle problem
   
   ## AI Disclosure
   
   This PR was developed with AI assistance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to