The GitHub Actions job "Tests" on airflow.git/fix-dag-processor-crash-on-mysql-failure has failed. Run started by GitHub user AmosG (triggered by potiuk).
Head commit for run: 6c89a3f38e997c2f6f2f6aa2fb61dad4898e60e0 / Amos Gutman <[email protected]> Fix DAG processor crash on MySQL connection failure during import error recording The DAG processor was crashing when MySQL connection failures occurred while recording DAG import errors to the database. The root cause was missing session.rollback() calls after caught exceptions, leaving the SQLAlchemy session in an invalid state. When session.flush() was subsequently called, it would raise a new exception that wasn't caught, causing the DAG processor to crash and enter restart loops. This issue was observed in production environments where the DAG processor would restart 1,259 times in 4 days (~13 restarts/hour), leading to: - Connection pool exhaustion - Cascading failures across Airflow components - Import errors not being recorded in the UI - System instability Changes: - Add session.rollback() after caught exceptions in _update_import_errors() - Add session.rollback() after caught exceptions in _update_dag_warnings() - Wrap session.flush() in try-except with session.rollback() on failure - Add comprehensive unit tests for all failure scenarios - Update comments to clarify error handling behavior The fix ensures the DAG processor gracefully handles database connection failures and continues processing other DAGs instead of crashing. Report URL: https://github.com/apache/airflow/actions/runs/20009006496 With regards, GitHub Actions via GitBox --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
