Hi all,

I'd like to start a discussion about adding MariaDB as a supported database
backend for Apache Airflow, alongside the existing MySQL and PostgreSQL
backends.

## Summary

This proposal shows all the changes needed to support MariaDB. All core
tests pass on LTS versions of both MySQL and MariaDB.

The compatibility commit (
https://github.com/apache/airflow/pull/60133/changes/b9da4fe819503ff1aad67aa0943da3ac33707d8d)
changes 6 files: 4 production code files and 2 test files. Each change is
detailed below with the root cause, the fix, and the maintenance burden.

## 1. Scope of Required Code Changes

### Overview

| # | File | What changes | MariaDB-specific? | Maintenance burden |
|---|------|--------------|-------------------|-------------------|
| 1 | `airflow/utils/sqlalchemy.py` | Bug fix: remove incorrect
`supports_for_update_of` guard | No - improves MySQL too | None (code
removal) |
| 2 | `migrations/0036_...add_name_field_to_dataset_model.py` | MySQL Error
1093 workaround | Shared MySQL/MariaDB | None (one-time migration) |
| 3 | `migrations/0049_...remove_pickled_data_from_xcom_table.py` | MariaDB
REGEXP_REPLACE syntax | MariaDB-specific branch | None (one-time migration)
|
| 4 | `airflow/jobs/scheduler_job_runner.py` | Disable unused RETURNING
clause | No - benefits all dialects | None (one-line addition) |
| 5 | `tests/.../test_exceptions.py` | Accept multiple valid SQL formats |
Test robustness improvement | None |
| 6 | `tests/.../test_sqlalchemy.py` | Fix test expectation after #1 | Test
correction | None |

Key takeaway: Of the 6 changes, only one (Migration 0049) is truly
MariaDB-specific. The others are either bug fixes, general improvements, or
shared MySQL/MariaDB workarounds.

## 2. Detailed Analysis of Each Change

### Change 1: `with_row_locks` - Remove incorrect `supports_for_update_of`
guard

**File:** `airflow-core/src/airflow/utils/sqlalchemy.py`

Root cause: The `with_row_locks()` function had a guard that disabled
row-level locking when `supports_for_update_of` was `False`. This check
made sense when it was written (MariaDB 10.3 lacked full locking support),
but it incorrectly disables row locking on MariaDB 10.6+.

The facts:
- MariaDB 10.6+ fully supports `FOR UPDATE`, `NOWAIT`, and `SKIP LOCKED`
- SQLAlchemy already guards the `OF <table>` clause internally when the
dialect doesn't support it
- The `supports_for_update_of` attribute tests the wrong capability for the
intended purpose

Fix: Remove the 4-line guard entirely. The `USE_ROW_LEVEL_LOCKING` config
flag is sufficient for users who want to explicitly disable row locking.

Impact: This is a bug fix that benefits all MySQL-family backends.

Maintenance burden: Zero - this is pure code removal.

### Change 2: Migration 0036 - MariaDB Error 1093 workaround

**File:**
`airflow-core/src/airflow/migrations/versions/0036_3_0_0_add_name_field_to_dataset_model.py`

Root cause: The migration's `downgrade()` function uses a CTE-based `DELETE
... WHERE id IN (SELECT * FROM cte)` query to remove duplicate datasets.
MariaDB raises Error 1093 when a `DELETE` statement references the same
table in a subquery.

Fix: Add a `dialect == "mysql"` branch that rewrites the query using a
derived table wrapper.

Impact: This is MariaDB-specific in practice, but the original query works
fine on MySQL 8.0.

Maintenance burden: Zero - this is a one-time migration that will never
change.

### Change 3: Migration 0049 - MariaDB REGEXP_REPLACE syntax

**File:**
`airflow-core/src/airflow/migrations/versions/0049_3_0_0_remove_pickled_data_from_xcom_table.py`

Root cause: This migration sanitizes non-standard JSON tokens in the XCom
table during the pickle-to-JSON conversion. The `REGEXP_REPLACE` function
has different signatures between MySQL 8 and MariaDB:

| Database | Signature | Regex Engine |
|----------|----------|-------------|
| MySQL 8 | 6 args | ICU |
| MariaDB | 3 args | PCRE2 |

Fix: Add an `is_mariadb` check within the existing `dialect == "mysql"`
branch to use the 3-argument form with PCRE2 backreferences.

Impact: This is the only truly MariaDB-specific production code change.

Maintenance burden: Zero - this is a one-time migration that will never
change.

### Change 4: `scheduler_job_runner.py` - Disable unused RETURNING clause

**File:** `airflow-core/src/airflow/jobs/scheduler_job_runner.py`

Root cause: The `_orphan_unreferenced_assets()` method executes a `DELETE
... WHERE EXISTS (SELECT ... FROM cte)` query. On MariaDB, SQLAlchemy's
dialect automatically adds a `RETURNING` clause to DELETE statements, but
MariaDB doesn't support `RETURNING` when the DELETE references a CTE in an
EXISTS subquery.

Fix: Add `.execution_options(return_defaults=False)` to disable the
unnecessary `RETURNING` clause.

Impact: This is a general improvement for all dialects (prevents syntax
error on MariaDB, avoids overhead on PostgreSQL).

Maintenance burden: Zero - this is a one-line addition that makes the code
more explicit about its intent.

### Change 5: `test_exceptions.py` - Accept multiple valid SQL formats

**File:** `airflow-core/tests/unit/api_fastapi/common/test_exceptions.py`

Root cause: The unique constraint error handler tests had hardcoded
expected SQL statements and error messages that were specific to one MySQL
connector (mysqlclient with `%(name)s` parameter style). Different
connectors and database servers produce different output:

| Aspect | mysqlclient + MySQL | PyMySQL + MariaDB |
|--------|-------------------|-------------------|
| Parameter style | `%(name)s` (pyformat) | `%s` (format) |
| Constraint name in error | `slot_pool_pool_uq` |
`slot_pool.slot_pool_pool_uq` |

Fix: Change expected `statement` and `orig_error` values from single
strings to lists of acceptable values.

Impact: This is a test robustness improvement that makes the tests
connector-agnostic.

Maintenance burden: Negligible - only affects test expectations, not
production code.

### Change 6: `test_sqlalchemy.py` - Fix test expectation after row-lock fix

**File:** `airflow-core/tests/unit/utils/test_sqlalchemy.py`

Root cause: After removing the `supports_for_update_of` guard, the test
case for `("mysql", False, True, ...)` needed its expected value updated
from `False` to `True`.

Fix: Update the test case.

Impact: Test-only change.

Maintenance burden: Zero.

## 3. SQLAlchemy Version Requirement

Airflow already requires SQLAlchemy >= 2.0.46. This version includes two
relevant fixes:

1. MariaDB `NOCYCLE` DDL compilation - SQLAlchemy 2.0.46 correctly omits
the `NO CYCLE` clause for MariaDB when creating sequences with `cycle=False`
2. aiosqlite thread-hanging fix - unrelated to MariaDB but included in the
same version bump

No application code changes were needed for the NOCYCLE fix - it's handled
entirely by SQLAlchemy.

## 4. What Does NOT Need to Change

It's equally important to note what works without any modifications:
- All SQLAlchemy ORM operations
- All Alembic migrations (except the two noted above)
- JSON column type
- Connection pooling and async
- All 1900+ core tests pass on MariaDB 11.8 with just these 6 file changes

## 5. Maintenance Overhead Assessment

### One-time changes (zero ongoing maintenance)
- Migrations 0036 and 0049: These are frozen historical migrations
- Row-lock fix: Code removal
- RETURNING fix: One-line addition
- Test improvements: More robust assertions

### Potential future attention areas
- Raw SQL with `REGEXP_REPLACE`: If new migrations use database-specific
regex, they would need a MariaDB branch (but this is rare - only 1
migration in the entire history needed it)
- New `DELETE ... RETURNING` with CTE patterns: If new code uses this
pattern, it would need `.execution_options(return_defaults=False)` - but
this is good practice regardless
- SQLAlchemy dialect differences: SQLAlchemy's MySQL/MariaDB runtime
detection handles the vast majority of differences automatically

### Quantitative comparison
The total diff for MariaDB compatibility is approximately +80 / -30 lines
across 6 files, of which the majority are test improvements and migration
workarounds that benefit MySQL as well.

## 6. Testing Infrastructure

The second commit (
https://github.com/apache/airflow/pull/60133/changes/b47cea389a9ed4f97a02820c992f505da82a4e4c)
is just a workaround to enable MariaDB testing in the CI pipeline. (When
setting "mysql" dialect and prefixing the version to "mariadb:11.4" causes
a MariaDB Docker image to be spun up instead of a MySQL one)

This is just a temporary mesure, that has permitted us to confirm that the
previous compatibility commit works with existing tests and adds MariaDB
testing to the CI pipeline.

## 7. Conclusion

Supporting MariaDB requires minimal code changes.

The maintenance overhead is demonstrably low: after months of continuous
rebasing against the main branch, no additional MariaDB-specific changes
were required beyond the initial 6-file changeset. The compatibility has
proven stable across multiple Airflow releases.

Furthermore, the number of required corrections has actually decreased over
time, primarily because requiring a recent version of SQLAlchemy (>=
2.0.46) handles many of the dialect differences automatically. Early
MariaDB support attempts needed more workarounds, but SQLAlchemy's improved
MariaDB dialect support has eliminated many of those needs.

Looking forward to the community's thoughts on this proposal.

Best regards,
Diego Dupin

Reply via email to