8silvergun opened a new pull request, #62336:
URL: https://github.com/apache/airflow/pull/62336
## Summary
Fix unhandled `MySQLdb.OperationalError: (2006, 'Server has gone away')` in
`cleanup_session_middleware` when `Session.remove()` encounters a dead database
connection.
## Problem
PR #61480 introduced `cleanup_session_middleware` in `providers-fab 3.3.0`
to fix `PendingRollbackError` (#59349). The middleware calls `Session.remove()`
in a bare `finally` block:
```python
finally:
from airflow import settings
if settings.Session:
settings.Session.remove() # can raise if DB connection is dead
```
When the underlying database connection has been closed server-side (MySQL
timeout, Aurora failover, network interruption), `Session.remove()` internally
attempts a `ROLLBACK` on the dead connection, raising `OperationalError`. This
unhandled exception propagates as a **500 Internal Server Error** — even though
the original request completed successfully.
**Production error log:**
```
[error] Exception in ASGI application
[airflow.providers.fab.auth_manager.fab_auth_manager]
loc=fab_auth_manager.py:243
File ".../fab_auth_manager.py", line 243, in cleanup_session_middleware
settings.Session.remove()
...
MySQLdb.OperationalError: (2006, 'Server has gone away')
```
## Solution
Wrap `Session.remove()` in a try-except that catches and logs the error as a
warning:
```python
finally:
from airflow import settings
if settings.Session:
try:
settings.Session.remove()
except Exception:
log.warning("Failed to remove session during cleanup",
exc_info=True)
```
This is consistent with session cleanup patterns elsewhere in Airflow (e.g.,
`airflow/utils/session.py`).
### Why this is safe
- `Session.remove()` is a cleanup operation — if it fails because the
connection is already dead, the session will be discarded anyway on the next
request
- The warning log preserves visibility for debugging
- The `except Exception` is intentionally broad since any error during
cleanup should not affect the HTTP response
## Testing
Added `TestFabAuthManagerSessionCleanupErrorHandling` with 2 tests:
- `test_session_remove_db_error_does_not_propagate`: Verifies
`OperationalError` (MySQL 'Server has gone away') is caught
- `test_session_remove_generic_error_does_not_propagate`: Verifies other
exceptions (e.g., `RuntimeError`) are also caught
## Related
- #59349 — Original `PendingRollbackError` issue
- #61480 — PR that introduced `cleanup_session_middleware` (this PR fixes a
gap in that implementation)
- #57470, #57859 — Earlier reports of the session lifecycle problem
## AI Disclosure
This PR was developed with AI assistance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]