nickstenning opened a new pull request, #59205:
URL: https://github.com/apache/airflow/pull/59205

   To allow `airflow db migrate` to respect its `-r` option when run against an 
empty database, we need to supply an initial migration which creates the 
database in the state expected by the existing 0001 migration.
   
   This commit:
   
   1. Adds such an initial migration.
   2. Updates `upgradedb` to always run a migration-based upgrade rather than a 
direct reflection of table metadata when `to_revision` is supplied.
   3. Fix `add_default_pool_if_not_exists` to not assume that it's running with 
the latest database schema.
   
   My primary motivation here is to make it easy to prepare an Airflow database 
with a schema at any point in the migration history, for the purposes of 
testing and improving migrations. This specifically came up in the context of 
identifying that the `sqlite` dialect migration for 
[`0042_3_0_0_add_uuid_primary_key_to_task_instance_.py`](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/migrations/versions/0042_3_0_0_add_uuid_primary_key_to_task_instance_.py)
 is quite broken.
   
   I'm not entirely clear on what the status is of the `ab_*` tables or the 
`session` table is, and whether they should or should not be included here. The 
difficulty of omitting them altogether is that they are referenced -- albeit 
only once -- by subsequent migrations: see [migration 
0028](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/migrations/versions/0028_3_0_0_drop_ab_user_id_foreign_key.py).
 
   
   To assess whether this base migration results in the same end result as 
simply running `initdb`, I've been running a script something like this to 
print the schema as seen by `sqlalchemy` in a canonicalized order:
   
   <details>
   
   <summary><strong>Script to print a canonicalized schema: 
<code>print_tables.py</code></strong></summary>
   
   ```python
   from airflow.models import import_all_models
   from airflow.utils.db import reflect_tables
   from airflow.utils.session import create_session
   
   
   def print_metadata_canonical(metadata):
       for table_name in sorted(metadata.tables.keys()):
           table = metadata.tables[table_name]
           print(f"\nTable: {table_name}")
           print(f"  Schema: {table.schema}")
   
           for col_name in sorted(table.columns.keys()):
               col = table.columns[col_name]
               nullable = "NULL" if col.nullable else "NOT NULL"
               default = f" DEFAULT {col.default}" if col.default else ""
               print(f"  Column: {col_name} {col.type} {nullable}{default}")
   
           if table.primary_key:
               pk_cols = sorted([c.name for c in table.primary_key.columns])
               print(f"  Primary Key: {pk_cols}")
   
           for constraint in sorted(table.constraints, key=lambda x: x.name or 
""):
               if constraint.__class__.__name__ == "UniqueConstraint":
                   cols = sorted([c.name for c in constraint.columns])
                   print(f"  Unique: {constraint.name} on {cols}")
   
           for constraint in sorted(table.constraints, key=lambda x: x.name or 
""):
               if constraint.__class__.__name__ == "CheckConstraint":
                   print(f"  Check: {constraint.name} ({constraint.sqltext})")
   
           for fk in sorted(table.foreign_keys, key=lambda x: (x.parent.name, 
str(x.column))):
               print(f"  FK: {fk.parent.name} -> {fk.column}")
   
           for idx in sorted(table.indexes, key=lambda x: x.name or ""):
               cols = [c.name for c in idx.columns]
               unique = "UNIQUE " if idx.unique else ""
               print(f"  Index: {unique}{idx.name} on {cols}")
   
   
   if __name__ == "__main__":
       import_all_models()
   
       with create_session() as s:
           metadata = reflect_tables(session=s, tables=None)
   
           print_metadata_canonical(metadata)
   ```
   
   </details>
   
   Here are the diffs when running this on main.
   
   <details>
   
   <summary><strong>No external DB managers</strong></summary>
   
   ```console
   $ AIRFLOW__DATABASE__EXTERNAL_DB_MANAGERS= 
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=sqlite:////tmp/test.db uv run airflow db 
migrate -r heads
   [...]
   $ AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=sqlite:////tmp/test.db uv run 
print_tables.py > migrated
   $ AIRFLOW__DATABASE__EXTERNAL_DB_MANAGERS= 
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=sqlite:////tmp/test-initdb.db uv run 
airflow db migrate
   [...]
   $ AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=sqlite:////tmp/test-initdb.db uv run 
print_tables.py > initdb
   $ diff -u initdb migrated
   --- initdb      2025-12-08 16:41:13
   +++ migrated    2025-12-08 16:40:44
   @@ -1,4 +1,102 @@
   
   +Table: _xcom_archive
   +  Schema: None
   +  Column: dag_id VARCHAR(250) NOT NULL
   +  Column: dag_run_id INTEGER NOT NULL
   +  Column: key VARCHAR(512) NOT NULL
   +  Column: map_index INTEGER NOT NULL
   +  Column: run_id VARCHAR(250) NOT NULL
   +  Column: task_id VARCHAR(250) NOT NULL
   +  Column: timestamp TIMESTAMP NOT NULL
   +  Column: value BLOB NULL
   +  Primary Key: ['dag_run_id', 'key', 'map_index', 'task_id']
   +
   +Table: ab_permission
   +  Schema: None
   +  Column: id INTEGER NOT NULL
   +  Column: name VARCHAR(100) NOT NULL
   +  Primary Key: ['id']
   +  Unique: ab_permission_name_uq on ['name']
   +
   +Table: ab_permission_view
   +  Schema: None
   +  Column: id INTEGER NOT NULL
   +  Column: permission_id INTEGER NULL
   +  Column: view_menu_id INTEGER NULL
   +  Primary Key: ['id']
   +  Unique: ab_permission_view_permission_id_view_menu_id_uq on 
['permission_id', 'view_menu_id']
   +  FK: permission_id -> ab_permission.id
   +  FK: view_menu_id -> ab_view_menu.id
   +
   +Table: ab_permission_view_role
   +  Schema: None
   +  Column: id INTEGER NOT NULL
   +  Column: permission_view_id INTEGER NULL
   +  Column: role_id INTEGER NULL
   +  Primary Key: ['id']
   +  Unique: ab_permission_view_role_permission_view_id_role_id_uq on 
['permission_view_id', 'role_id']
   +  FK: permission_view_id -> ab_permission_view.id
   +  FK: role_id -> ab_role.id
   +
   +Table: ab_register_user
   +  Schema: None
   +  Column: email VARCHAR(512) NOT NULL
   +  Column: first_name VARCHAR(256) NOT NULL
   +  Column: id INTEGER NOT NULL
   +  Column: last_name VARCHAR(256) NOT NULL
   +  Column: password VARCHAR(256) NULL
   +  Column: registration_date TIMESTAMP NULL
   +  Column: registration_hash VARCHAR(256) NULL
   +  Column: username VARCHAR(512) NOT NULL
   +  Primary Key: ['id']
   +  Unique: ab_register_user_username_uq on ['username']
   +
   +Table: ab_role
   +  Schema: None
   +  Column: id INTEGER NOT NULL
   +  Column: name VARCHAR(64) NOT NULL
   +  Primary Key: ['id']
   +  Unique: ab_role_name_uq on ['name']
   +
   +Table: ab_user
   +  Schema: None
   +  Column: active BOOLEAN NULL
   +  Column: changed_by_fk INTEGER NULL
   +  Column: changed_on TIMESTAMP NULL
   +  Column: created_by_fk INTEGER NULL
   +  Column: created_on TIMESTAMP NULL
   +  Column: email VARCHAR(512) NOT NULL
   +  Column: fail_login_count INTEGER NULL
   +  Column: first_name VARCHAR(256) NOT NULL
   +  Column: id INTEGER NOT NULL
   +  Column: last_login TIMESTAMP NULL
   +  Column: last_name VARCHAR(256) NOT NULL
   +  Column: login_count INTEGER NULL
   +  Column: password VARCHAR(256) NULL
   +  Column: username VARCHAR(512) NOT NULL
   +  Primary Key: ['id']
   +  Unique: ab_user_email_uq on ['email']
   +  Unique: ab_user_username_uq on ['username']
   +  FK: changed_by_fk -> ab_user.id
   +  FK: created_by_fk -> ab_user.id
   +
   +Table: ab_user_role
   +  Schema: None
   +  Column: id INTEGER NOT NULL
   +  Column: role_id INTEGER NULL
   +  Column: user_id INTEGER NULL
   +  Primary Key: ['id']
   +  Unique: ab_user_role_user_id_role_id_uq on ['role_id', 'user_id']
   +  FK: role_id -> ab_role.id
   +  FK: user_id -> ab_user.id
   +
   +Table: ab_view_menu
   +  Schema: None
   +  Column: id INTEGER NOT NULL
   +  Column: name VARCHAR(250) NOT NULL
   +  Primary Key: ['id']
   +  Unique: ab_view_menu_name_uq on ['name']
   +
    Table: alembic_version
      Schema: None
      Column: version_num VARCHAR(32) NOT NULL
   @@ -535,11 +633,6 @@
      Unique: slot_pool_pool_uq on ['pool']
      FK: team_id -> team.id
   
   -Table: sqlite_sequence
   -  Schema: None
   -  Column: name NULL NULL
   -  Column: seq NULL NULL
   -
    Table: task_inlet_asset_reference
      Schema: None
      Column: asset_id INTEGER NOT NULL
   ```
   
   </details>
   
   Or, if I run with 
`AIRFLOW__DATABASE__EXTERNAL_DB_MANAGERS=airflow.providers.fab.auth_manager.models.db.FABDBManager`:
   
   <details>
   
   <summary><strong>With <code>FABDBManager</code> as external DB 
manager</strong></summary>
   
   ```console
   $ 
AIRFLOW__DATABASE__EXTERNAL_DB_MANAGERS=airflow.providers.fab.auth_manager.models.db.FABDBManager
 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=sqlite:////tmp/test.db uv run airflow db 
migrate -r heads
   [...]
   $ AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=sqlite:////tmp/test.db uv run 
print_tables.py > migrated
   $ 
AIRFLOW__DATABASE__EXTERNAL_DB_MANAGERS=airflow.providers.fab.auth_manager.models.db.FABDBManager
 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=sqlite:////tmp/test-initdb.db uv run 
airflow db migrate
   [...]
   $ AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=sqlite:////tmp/test-initdb.db uv run 
print_tables.py > initdb
   $ diff -u initdb migrated
   --- initdb      2025-12-08 16:44:44
   +++ migrated    2025-12-08 16:44:22
   @@ -1,4 +1,16 @@
   
   +Table: _xcom_archive
   +  Schema: None
   +  Column: dag_id VARCHAR(250) NOT NULL
   +  Column: dag_run_id INTEGER NOT NULL
   +  Column: key VARCHAR(512) NOT NULL
   +  Column: map_index INTEGER NOT NULL
   +  Column: run_id VARCHAR(250) NOT NULL
   +  Column: task_id VARCHAR(250) NOT NULL
   +  Column: timestamp TIMESTAMP NOT NULL
   +  Column: value BLOB NULL
   +  Primary Key: ['dag_run_id', 'key', 'map_index', 'task_id']
   +
    Table: ab_group
      Schema: None
      Column: description VARCHAR(512) NULL
   @@ -6,7 +18,7 @@
      Column: label VARCHAR(150) NULL
      Column: name VARCHAR(100) NOT NULL
      Primary Key: ['id']
   -  Unique: None on ['name']
   +  Unique: ab_group_name_uq on ['name']
   
    Table: ab_group_role
      Schema: None
   @@ -14,7 +26,7 @@
      Column: id INTEGER NOT NULL
      Column: role_id INTEGER NULL
      Primary Key: ['id']
   -  Unique: None on ['group_id', 'role_id']
   +  Unique: ab_group_role_group_id_role_id_uq on ['group_id', 'role_id']
      FK: group_id -> ab_group.id
      FK: role_id -> ab_role.id
      Index: idx_group_id on ['group_id']
   @@ -25,7 +37,7 @@
      Column: id INTEGER NOT NULL
      Column: name VARCHAR(100) NOT NULL
      Primary Key: ['id']
   -  Unique: None on ['name']
   +  Unique: ab_permission_name_uq on ['name']
   
    Table: ab_permission_view
      Schema: None
   @@ -33,7 +45,7 @@
      Column: permission_id INTEGER NULL
      Column: view_menu_id INTEGER NULL
      Primary Key: ['id']
   -  Unique: None on ['permission_id', 'view_menu_id']
   +  Unique: ab_permission_view_permission_id_view_menu_id_uq on 
['permission_id', 'view_menu_id']
      FK: permission_id -> ab_permission.id
      FK: view_menu_id -> ab_view_menu.id
   
   @@ -43,52 +55,49 @@
      Column: permission_view_id INTEGER NULL
      Column: role_id INTEGER NULL
      Primary Key: ['id']
   -  Unique: None on ['permission_view_id', 'role_id']
   +  Unique: ab_permission_view_role_permission_view_id_role_id_uq on 
['permission_view_id', 'role_id']
      FK: permission_view_id -> ab_permission_view.id
      FK: role_id -> ab_role.id
   -  Index: idx_permission_view_id on ['permission_view_id']
   -  Index: idx_role_id on ['role_id']
   
    Table: ab_register_user
      Schema: None
   -  Column: email VARCHAR(320) NOT NULL
   -  Column: first_name VARCHAR(64) NOT NULL
   +  Column: email VARCHAR(512) NOT NULL
   +  Column: first_name VARCHAR(256) NOT NULL
      Column: id INTEGER NOT NULL
   -  Column: last_name VARCHAR(64) NOT NULL
   +  Column: last_name VARCHAR(256) NOT NULL
      Column: password VARCHAR(256) NULL
   -  Column: registration_date DATETIME NULL
   +  Column: registration_date TIMESTAMP NULL
      Column: registration_hash VARCHAR(256) NULL
      Column: username VARCHAR(512) NOT NULL
      Primary Key: ['id']
   -  Unique: None on ['email']
   -  Unique: None on ['username']
   +  Unique: ab_register_user_username_uq on ['username']
   
    Table: ab_role
      Schema: None
      Column: id INTEGER NOT NULL
      Column: name VARCHAR(64) NOT NULL
      Primary Key: ['id']
   -  Unique: None on ['name']
   +  Unique: ab_role_name_uq on ['name']
   
    Table: ab_user
      Schema: None
      Column: active BOOLEAN NULL
      Column: changed_by_fk INTEGER NULL
   -  Column: changed_on DATETIME NULL
   +  Column: changed_on TIMESTAMP NULL
      Column: created_by_fk INTEGER NULL
   -  Column: created_on DATETIME NULL
   -  Column: email VARCHAR(320) NOT NULL
   +  Column: created_on TIMESTAMP NULL
   +  Column: email VARCHAR(512) NOT NULL
      Column: fail_login_count INTEGER NULL
   -  Column: first_name VARCHAR(64) NOT NULL
   +  Column: first_name VARCHAR(256) NOT NULL
      Column: id INTEGER NOT NULL
   -  Column: last_login DATETIME NULL
   -  Column: last_name VARCHAR(64) NOT NULL
   +  Column: last_login TIMESTAMP NULL
   +  Column: last_name VARCHAR(256) NOT NULL
      Column: login_count INTEGER NULL
      Column: password VARCHAR(256) NULL
      Column: username VARCHAR(512) NOT NULL
      Primary Key: ['id']
   -  Unique: None on ['username']
   -  Unique: None on ['email']
   +  Unique: ab_user_email_uq on ['email']
   +  Unique: ab_user_username_uq on ['username']
      FK: changed_by_fk -> ab_user.id
      FK: created_by_fk -> ab_user.id
   
   @@ -98,9 +107,11 @@
      Column: id INTEGER NOT NULL
      Column: user_id INTEGER NULL
      Primary Key: ['id']
   -  Unique: None on ['group_id', 'user_id']
   +  Unique: ab_user_group_user_id_group_id_uq on ['group_id', 'user_id']
      FK: group_id -> ab_group.id
      FK: user_id -> ab_user.id
   +  Index: idx_user_group_id on ['group_id']
   +  Index: idx_user_id on ['user_id']
   
    Table: ab_user_role
      Schema: None
   @@ -108,7 +119,7 @@
      Column: role_id INTEGER NULL
      Column: user_id INTEGER NULL
      Primary Key: ['id']
   -  Unique: None on ['role_id', 'user_id']
   +  Unique: ab_user_role_user_id_role_id_uq on ['role_id', 'user_id']
      FK: role_id -> ab_role.id
      FK: user_id -> ab_user.id
   
   @@ -117,7 +128,7 @@
      Column: id INTEGER NOT NULL
      Column: name VARCHAR(250) NOT NULL
      Primary Key: ['id']
   -  Unique: None on ['name']
   +  Unique: ab_view_menu_name_uq on ['name']
   
    Table: alembic_version
      Schema: None
   @@ -648,15 +659,6 @@
      Unique: serialized_dag_dag_version_id_uq on ['dag_version_id']
      FK: dag_version_id -> dag_version.id
   
   -Table: session
   -  Schema: None
   -  Column: data BLOB NULL
   -  Column: expiry DATETIME NULL
   -  Column: id INTEGER NOT NULL
   -  Column: session_id VARCHAR(255) NULL
   -  Primary Key: ['id']
   -  Unique: None on ['session_id']
   -
    Table: slot_pool
      Schema: None
      Column: description TEXT NULL
   @@ -669,11 +671,6 @@
      Unique: slot_pool_pool_uq on ['pool']
      FK: team_id -> team.id
   
   -Table: sqlite_sequence
   -  Schema: None
   -  Column: name NULL NULL
   -  Column: seq NULL NULL
   -
    Table: task_inlet_asset_reference
      Schema: None
      Column: asset_id INTEGER NOT NULL
   ```
   
   </details>
   
   I'd greatly appreciate any feedback on whether this is wanted/helpful -- my 
view is that the `-r` option to `db migrate` should always be respected -- and 
if so, what if any changes are needed for the handling of the mentioned tables.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to