Hi all, I'd like to start a discussion about adding MariaDB as a supported database backend for Apache Airflow, alongside the existing MySQL and PostgreSQL backends.
## Summary This proposal shows all the changes needed to support MariaDB. All core tests pass on LTS versions of both MySQL and MariaDB. The compatibility commit ( https://github.com/apache/airflow/pull/60133/changes/b9da4fe819503ff1aad67aa0943da3ac33707d8d) changes 6 files: 4 production code files and 2 test files. Each change is detailed below with the root cause, the fix, and the maintenance burden. ## 1. Scope of Required Code Changes ### Overview | # | File | What changes | MariaDB-specific? | Maintenance burden | |---|------|--------------|-------------------|-------------------| | 1 | `airflow/utils/sqlalchemy.py` | Bug fix: remove incorrect `supports_for_update_of` guard | No - improves MySQL too | None (code removal) | | 2 | `migrations/0036_...add_name_field_to_dataset_model.py` | MySQL Error 1093 workaround | Shared MySQL/MariaDB | None (one-time migration) | | 3 | `migrations/0049_...remove_pickled_data_from_xcom_table.py` | MariaDB REGEXP_REPLACE syntax | MariaDB-specific branch | None (one-time migration) | | 4 | `airflow/jobs/scheduler_job_runner.py` | Disable unused RETURNING clause | No - benefits all dialects | None (one-line addition) | | 5 | `tests/.../test_exceptions.py` | Accept multiple valid SQL formats | Test robustness improvement | None | | 6 | `tests/.../test_sqlalchemy.py` | Fix test expectation after #1 | Test correction | None | Key takeaway: Of the 6 changes, only one (Migration 0049) is truly MariaDB-specific. The others are either bug fixes, general improvements, or shared MySQL/MariaDB workarounds. ## 2. Detailed Analysis of Each Change ### Change 1: `with_row_locks` - Remove incorrect `supports_for_update_of` guard **File:** `airflow-core/src/airflow/utils/sqlalchemy.py` Root cause: The `with_row_locks()` function had a guard that disabled row-level locking when `supports_for_update_of` was `False`. This check made sense when it was written (MariaDB 10.3 lacked full locking support), but it incorrectly disables row locking on MariaDB 10.6+. The facts: - MariaDB 10.6+ fully supports `FOR UPDATE`, `NOWAIT`, and `SKIP LOCKED` - SQLAlchemy already guards the `OF <table>` clause internally when the dialect doesn't support it - The `supports_for_update_of` attribute tests the wrong capability for the intended purpose Fix: Remove the 4-line guard entirely. The `USE_ROW_LEVEL_LOCKING` config flag is sufficient for users who want to explicitly disable row locking. Impact: This is a bug fix that benefits all MySQL-family backends. Maintenance burden: Zero - this is pure code removal. ### Change 2: Migration 0036 - MariaDB Error 1093 workaround **File:** `airflow-core/src/airflow/migrations/versions/0036_3_0_0_add_name_field_to_dataset_model.py` Root cause: The migration's `downgrade()` function uses a CTE-based `DELETE ... WHERE id IN (SELECT * FROM cte)` query to remove duplicate datasets. MariaDB raises Error 1093 when a `DELETE` statement references the same table in a subquery. Fix: Add a `dialect == "mysql"` branch that rewrites the query using a derived table wrapper. Impact: This is MariaDB-specific in practice, but the original query works fine on MySQL 8.0. Maintenance burden: Zero - this is a one-time migration that will never change. ### Change 3: Migration 0049 - MariaDB REGEXP_REPLACE syntax **File:** `airflow-core/src/airflow/migrations/versions/0049_3_0_0_remove_pickled_data_from_xcom_table.py` Root cause: This migration sanitizes non-standard JSON tokens in the XCom table during the pickle-to-JSON conversion. The `REGEXP_REPLACE` function has different signatures between MySQL 8 and MariaDB: | Database | Signature | Regex Engine | |----------|----------|-------------| | MySQL 8 | 6 args | ICU | | MariaDB | 3 args | PCRE2 | Fix: Add an `is_mariadb` check within the existing `dialect == "mysql"` branch to use the 3-argument form with PCRE2 backreferences. Impact: This is the only truly MariaDB-specific production code change. Maintenance burden: Zero - this is a one-time migration that will never change. ### Change 4: `scheduler_job_runner.py` - Disable unused RETURNING clause **File:** `airflow-core/src/airflow/jobs/scheduler_job_runner.py` Root cause: The `_orphan_unreferenced_assets()` method executes a `DELETE ... WHERE EXISTS (SELECT ... FROM cte)` query. On MariaDB, SQLAlchemy's dialect automatically adds a `RETURNING` clause to DELETE statements, but MariaDB doesn't support `RETURNING` when the DELETE references a CTE in an EXISTS subquery. Fix: Add `.execution_options(return_defaults=False)` to disable the unnecessary `RETURNING` clause. Impact: This is a general improvement for all dialects (prevents syntax error on MariaDB, avoids overhead on PostgreSQL). Maintenance burden: Zero - this is a one-line addition that makes the code more explicit about its intent. ### Change 5: `test_exceptions.py` - Accept multiple valid SQL formats **File:** `airflow-core/tests/unit/api_fastapi/common/test_exceptions.py` Root cause: The unique constraint error handler tests had hardcoded expected SQL statements and error messages that were specific to one MySQL connector (mysqlclient with `%(name)s` parameter style). Different connectors and database servers produce different output: | Aspect | mysqlclient + MySQL | PyMySQL + MariaDB | |--------|-------------------|-------------------| | Parameter style | `%(name)s` (pyformat) | `%s` (format) | | Constraint name in error | `slot_pool_pool_uq` | `slot_pool.slot_pool_pool_uq` | Fix: Change expected `statement` and `orig_error` values from single strings to lists of acceptable values. Impact: This is a test robustness improvement that makes the tests connector-agnostic. Maintenance burden: Negligible - only affects test expectations, not production code. ### Change 6: `test_sqlalchemy.py` - Fix test expectation after row-lock fix **File:** `airflow-core/tests/unit/utils/test_sqlalchemy.py` Root cause: After removing the `supports_for_update_of` guard, the test case for `("mysql", False, True, ...)` needed its expected value updated from `False` to `True`. Fix: Update the test case. Impact: Test-only change. Maintenance burden: Zero. ## 3. SQLAlchemy Version Requirement Airflow already requires SQLAlchemy >= 2.0.46. This version includes two relevant fixes: 1. MariaDB `NOCYCLE` DDL compilation - SQLAlchemy 2.0.46 correctly omits the `NO CYCLE` clause for MariaDB when creating sequences with `cycle=False` 2. aiosqlite thread-hanging fix - unrelated to MariaDB but included in the same version bump No application code changes were needed for the NOCYCLE fix - it's handled entirely by SQLAlchemy. ## 4. What Does NOT Need to Change It's equally important to note what works without any modifications: - All SQLAlchemy ORM operations - All Alembic migrations (except the two noted above) - JSON column type - Connection pooling and async - All 1900+ core tests pass on MariaDB 11.8 with just these 6 file changes ## 5. Maintenance Overhead Assessment ### One-time changes (zero ongoing maintenance) - Migrations 0036 and 0049: These are frozen historical migrations - Row-lock fix: Code removal - RETURNING fix: One-line addition - Test improvements: More robust assertions ### Potential future attention areas - Raw SQL with `REGEXP_REPLACE`: If new migrations use database-specific regex, they would need a MariaDB branch (but this is rare - only 1 migration in the entire history needed it) - New `DELETE ... RETURNING` with CTE patterns: If new code uses this pattern, it would need `.execution_options(return_defaults=False)` - but this is good practice regardless - SQLAlchemy dialect differences: SQLAlchemy's MySQL/MariaDB runtime detection handles the vast majority of differences automatically ### Quantitative comparison The total diff for MariaDB compatibility is approximately +80 / -30 lines across 6 files, of which the majority are test improvements and migration workarounds that benefit MySQL as well. ## 6. Testing Infrastructure The second commit ( https://github.com/apache/airflow/pull/60133/changes/b47cea389a9ed4f97a02820c992f505da82a4e4c) is just a workaround to enable MariaDB testing in the CI pipeline. (When setting "mysql" dialect and prefixing the version to "mariadb:11.4" causes a MariaDB Docker image to be spun up instead of a MySQL one) This is just a temporary mesure, that has permitted us to confirm that the previous compatibility commit works with existing tests and adds MariaDB testing to the CI pipeline. ## 7. Conclusion Supporting MariaDB requires minimal code changes. The maintenance overhead is demonstrably low: after months of continuous rebasing against the main branch, no additional MariaDB-specific changes were required beyond the initial 6-file changeset. The compatibility has proven stable across multiple Airflow releases. Furthermore, the number of required corrections has actually decreased over time, primarily because requiring a recent version of SQLAlchemy (>= 2.0.46) handles many of the dialect differences automatically. Early MariaDB support attempts needed more workarounds, but SQLAlchemy's improved MariaDB dialect support has eliminated many of those needs. Looking forward to the community's thoughts on this proposal. Best regards, Diego Dupin
