The GitHub Actions job "Tests" on 
airflow.git/fix-dag-processor-crash-on-mysql-failure has failed.
Run started by GitHub user AmosG (triggered by potiuk).

Head commit for run:
6c89a3f38e997c2f6f2f6aa2fb61dad4898e60e0 / Amos Gutman <[email protected]>
Fix DAG processor crash on MySQL connection failure during import error 
recording

The DAG processor was crashing when MySQL connection failures occurred while
recording DAG import errors to the database. The root cause was missing
session.rollback() calls after caught exceptions, leaving the SQLAlchemy
session in an invalid state. When session.flush() was subsequently called,
it would raise a new exception that wasn't caught, causing the DAG processor
to crash and enter restart loops.

This issue was observed in production environments where the DAG processor
would restart 1,259 times in 4 days (~13 restarts/hour), leading to:
- Connection pool exhaustion
- Cascading failures across Airflow components
- Import errors not being recorded in the UI
- System instability

Changes:
- Add session.rollback() after caught exceptions in _update_import_errors()
- Add session.rollback() after caught exceptions in _update_dag_warnings()
- Wrap session.flush() in try-except with session.rollback() on failure
- Add comprehensive unit tests for all failure scenarios
- Update comments to clarify error handling behavior

The fix ensures the DAG processor gracefully handles database connection
failures and continues processing other DAGs instead of crashing.

Report URL: https://github.com/apache/airflow/actions/runs/20009006496

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to