GitHub user jun-roh created a discussion: Airflow 3.1.3 – API Server returns 
500 Internal Server Error on /login once per day (session invalidation / 
PendingRollbackError)

Hello,

We are setting up an environment using Apache Airflow 3.1.3.
We are encountering an issue where the login session expires once per day, and 
after that, accessing the login page results in a 500 Internal Server Error 
from the API server.

This happens consistently and requires restarting the API server to recover.

Below are the error logs, configuration, and our custom AuthManager 
implementation.
Any help or guidance would be greatly appreciated.

### **Environment**
        •       Airflow version: 3.1.3
        •       Python version: 3.12
        •       Executor: CeleryExecutor
        •       Database: MySQL (RDS)
        •       Auth Manager: Custom SafeFabAuthManager (based on FAB)
        •       Deployment: Docker Compose
        •       API Server: airflow api-server --apps all --workers 1

### **Issue Summary**
        •       Login works initially
        •       After ~24 hours:
        •       Login session becomes invalid
        •       /auth/login/ returns 500 Internal Server Error
        •       Stack trace shows sqlalchemy.exc.PendingRollbackError
        •       Restarting airflow-apiserver resolves the issue temporarily

### **Error Log (Excerpt)**
```
sqlalchemy.exc.PendingRollbackError: Can't reconnect until invalid transaction 
is rolled back.
``` 
The error occurs during:
```
/login/ [GET]
airflow.providers.fab.auth_manager.security_manager.override.load_user
```

### **Docker Compose (airflow-apiserver)**
```
airflow-apiserver:
  command: >
    airflow api-server --host 0.0.0.0 --port 8080 --workers 1 --apps all
```
Database pooling configuration:
```
AIRFLOW__DATABASE__SQL_ALCHEMY_ENGINE_OPTIONS={
  "pool_pre_ping": true,
  "pool_recycle": 280,
  "pool_size": 100,
  "max_overflow": 20,
  "pool_timeout": 30,
  "pool_reset_on_return": "rollback"
}
```

### **Custom Auth Manager**

To mitigate the issue, we implemented a custom SafeFabAuthManager that wraps DB 
calls and forcibly resets the session when a PendingRollbackError occurs.

```
class SafeFabSecurityManager(FabAirflowSecurityManagerOverride):
    def _safe_db_call(self, func, *args, **kwargs):
        try:
            return func(*args, **kwargs)
        except Exception as e:
            if isinstance(e, (SQLAlchemyError, PendingRollbackError)):
                session = self._get_session()
                if session:
                    session.rollback()
                    session.close()
                return func(*args, **kwargs)
            raise
```
This reduces the frequency of failures but does not fully eliminate the issue.


### **Questions**
        1.      Is this a known issue in Airflow 3.1.x related to FAB / API 
Server session handling?
        2.      Is the API Server expected to manage SQLAlchemy sessions 
differently compared to the Webserver?
        3.      Are there recommended settings for:
        •       pool_recycle
        •       pool_reset_on_return
        •       session scoping in FAB auth?
        4.      Would using NullPool or disabling connection reuse for the API 
server be recommended?

### **Expected Behavior**
        •       Login page should gracefully redirect to re-authentication
        •       No 500 errors on /login
        •       Expired sessions should not poison the SQLAlchemy connection 
pool

GitHub link: https://github.com/apache/airflow/discussions/59487

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to