Dev-iL commented on code in PR #64972: URL: https://github.com/apache/airflow/pull/64972#discussion_r3069066644
########## scripts/ci/prek/check_migration_patterns.py: ########## @@ -0,0 +1,383 @@ +#!/usr/bin/env python +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# /// script +# requires-python = ">=3.10,<3.11" +# dependencies = [ +# "rich>=13.6.0", +# ] +# /// +""" +Static analysis checks for Alembic migration anti-patterns. + +These checks use AST analysis to detect common migration mistakes that can cause +silent failures, particularly on SQLite where ``PRAGMA foreign_keys`` has specific +requirements about transaction state. + +Rules +===== + +MIG001 -- DML before ``disable_sqlite_fkeys`` +---------------------------------------------- + +**What it does:** +Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, DELETE) that +appear before a ``with disable_sqlite_fkeys(op):`` block in the same function. + +**Why is this bad:** +On SQLite, any DML statement triggers *autobegin*, starting a transaction. Once a +transaction is active, ``PRAGMA foreign_keys=off`` (issued by ``disable_sqlite_fkeys``) +becomes a no-op. This means foreign key checks remain enabled, and +``batch_alter_table`` (which recreates tables) may fail with foreign key constraint +violations. + +**Example (bad):** + +.. code-block:: python + + def upgrade(): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") # triggers autobegin + with disable_sqlite_fkeys(op): # PRAGMA is now a no-op! + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + +**Use instead:** + +.. code-block:: python + + def upgrade(): + with disable_sqlite_fkeys(op): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + + +MIG002 -- DDL before ``disable_sqlite_fkeys`` +---------------------------------------------- + +**What it does:** +Detects any Alembic ``op.*`` call (DDL operations like ``op.add_column``, +``op.drop_column``, ``op.create_table``, ``op.drop_table``, ``op.create_index``, +``op.batch_alter_table``, etc.) that appears before a +``with disable_sqlite_fkeys(op):`` block in the same function. Excludes DML calls +covered by MIG001. + +**Why is this bad:** +DDL operations also trigger *autobegin* on SQLite, making the subsequent +``PRAGMA foreign_keys=off`` a no-op, exactly like DML. + +**Example (bad):** + +.. code-block:: python + + def upgrade(): + op.add_column("dag_run", sa.Column("created_at", ...)) # triggers autobegin + with disable_sqlite_fkeys(op): # PRAGMA is now a no-op! + with op.batch_alter_table("backfill") as batch_op: + batch_op.alter_column("col", nullable=True) + +**Use instead:** + +.. code-block:: python + + def upgrade(): + with disable_sqlite_fkeys(op): + op.add_column("dag_run", sa.Column("created_at", ...)) + with op.batch_alter_table("backfill") as batch_op: + batch_op.alter_column("col", nullable=True) + + +MIG003 -- DML without offline-mode guard +------------------------------------------ + +**What it does:** +Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, DELETE) in +``upgrade()`` or ``downgrade()`` functions that do not contain a +``context.is_offline_mode()`` check anywhere in the same function. + +**Why is this bad:** +Alembic's offline mode generates SQL scripts instead of executing against a live +database. DML statements like ``UPDATE ... WHERE col IS NULL`` depend on actual data +and produce no useful output in offline mode. Without an ``is_offline_mode()`` guard, +the generated script includes DML that may fail or silently do nothing when applied +later. + +**Example (bad):** + +.. code-block:: python + + def upgrade(): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + +**Use instead:** + +.. code-block:: python + + def upgrade(): + if not context.is_offline_mode(): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + + +Known Limitation -- Future MIG004 Candidate +--------------------------------------------- + +Migrations that perform DML and use ``batch_alter_table`` but never call +``disable_sqlite_fkeys`` at all are NOT detected by MIG001/MIG002. These migrations +may silently break SQLite foreign key handling. Detecting this pattern requires +understanding whether foreign key relationships exist on the affected tables, which +is beyond AST-level analysis. This is tracked as a future MIG004 candidate. + + +Suppression +=========== + +Add ``# noqa: MIG0XX`` to a line to suppress the corresponding check for that line. +Multiple codes can be comma-separated: ``# noqa: MIG001, MIG003``. + +Include a brief reason after ``--``:: + + # noqa: MIG003 -- simple UPDATE safe in offline mode +""" + +from __future__ import annotations + +import ast +import re +import sys +from pathlib import Path + +from rich.console import Console + +console = Console(color_system="standard", width=200) + +DML_KEYWORDS = ("UPDATE ", "INSERT ", "DELETE ") + + +def _get_noqa_codes(line: str) -> set[str]: + """Extract noqa codes from a source line.""" + match = re.search(r"#\s*noqa:\s*([\w,\s]+)", line) + if match: + return {code.strip() for code in match.group(1).split(",")} + return set() + + +def _is_dml_string(node: ast.expr) -> bool: + """Check if an AST expression is a string literal starting with a DML keyword.""" + if isinstance(node, ast.Constant) and isinstance(node.value, str): + stripped = node.value.strip().upper() + return any(stripped.startswith(kw) for kw in DML_KEYWORDS) + # f-strings: check the first string fragment + if isinstance(node, ast.JoinedStr): + for value in node.values: + if isinstance(value, ast.Constant) and isinstance(value.value, str): + stripped = value.value.strip().upper() + if any(stripped.startswith(kw) for kw in DML_KEYWORDS): + return True + break # only the leading fragment matters + return False + + +def _is_op_execute_with_dml(node: ast.Call) -> bool: + """Check if a Call node is ``op.execute(<DML string>)``.""" + if not ( + isinstance(node.func, ast.Attribute) + and isinstance(node.func.value, ast.Name) + and node.func.value.id == "op" + and node.func.attr == "execute" + ): + return False + if not node.args: + return False + arg = node.args[0] + # Direct string: op.execute("UPDATE ...") + if _is_dml_string(arg): + return True + # Wrapped: op.execute(text("UPDATE ...")) or op.execute(sa.text("UPDATE ...")) + if isinstance(arg, ast.Call) and arg.args: + return _is_dml_string(arg.args[0]) + return False + + +def _is_op_call(node: ast.Call) -> bool: + """Check if a Call node is ``op.<something>(...)``.""" + return ( + isinstance(node.func, ast.Attribute) + and isinstance(node.func.value, ast.Name) + and node.func.value.id == "op" + ) Review Comment: **Dismiss.** Investigated — zero migration files in the current codebase use non-DDL `op.*` calls (like `op.get_bind()`, `op.get_context()`) before `disable_sqlite_fkeys`. Verified with: `grep -r 'op\.get_bind\|op\.get_context' airflow-core/src/airflow/migrations/versions/` → no hits in any function that also calls `disable_sqlite_fkeys`. If a future migration introduces such a pattern, the false positive would be a safe-side alert rather than a missed violation. ########## scripts/ci/prek/check_migration_patterns.py: ########## @@ -0,0 +1,383 @@ +#!/usr/bin/env python +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# /// script +# requires-python = ">=3.10,<3.11" +# dependencies = [ +# "rich>=13.6.0", +# ] +# /// +""" +Static analysis checks for Alembic migration anti-patterns. + +These checks use AST analysis to detect common migration mistakes that can cause +silent failures, particularly on SQLite where ``PRAGMA foreign_keys`` has specific +requirements about transaction state. + +Rules +===== + +MIG001 -- DML before ``disable_sqlite_fkeys`` +---------------------------------------------- + +**What it does:** +Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, DELETE) that +appear before a ``with disable_sqlite_fkeys(op):`` block in the same function. + +**Why is this bad:** +On SQLite, any DML statement triggers *autobegin*, starting a transaction. Once a +transaction is active, ``PRAGMA foreign_keys=off`` (issued by ``disable_sqlite_fkeys``) +becomes a no-op. This means foreign key checks remain enabled, and +``batch_alter_table`` (which recreates tables) may fail with foreign key constraint +violations. + +**Example (bad):** + +.. code-block:: python + + def upgrade(): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") # triggers autobegin + with disable_sqlite_fkeys(op): # PRAGMA is now a no-op! + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + +**Use instead:** + +.. code-block:: python + + def upgrade(): + with disable_sqlite_fkeys(op): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + + +MIG002 -- DDL before ``disable_sqlite_fkeys`` +---------------------------------------------- + +**What it does:** +Detects any Alembic ``op.*`` call (DDL operations like ``op.add_column``, +``op.drop_column``, ``op.create_table``, ``op.drop_table``, ``op.create_index``, +``op.batch_alter_table``, etc.) that appears before a +``with disable_sqlite_fkeys(op):`` block in the same function. Excludes DML calls +covered by MIG001. + +**Why is this bad:** +DDL operations also trigger *autobegin* on SQLite, making the subsequent +``PRAGMA foreign_keys=off`` a no-op, exactly like DML. + +**Example (bad):** + +.. code-block:: python + + def upgrade(): + op.add_column("dag_run", sa.Column("created_at", ...)) # triggers autobegin + with disable_sqlite_fkeys(op): # PRAGMA is now a no-op! + with op.batch_alter_table("backfill") as batch_op: + batch_op.alter_column("col", nullable=True) + +**Use instead:** + +.. code-block:: python + + def upgrade(): + with disable_sqlite_fkeys(op): + op.add_column("dag_run", sa.Column("created_at", ...)) + with op.batch_alter_table("backfill") as batch_op: + batch_op.alter_column("col", nullable=True) + + +MIG003 -- DML without offline-mode guard +------------------------------------------ + +**What it does:** +Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, DELETE) in +``upgrade()`` or ``downgrade()`` functions that do not contain a +``context.is_offline_mode()`` check anywhere in the same function. + +**Why is this bad:** +Alembic's offline mode generates SQL scripts instead of executing against a live +database. DML statements like ``UPDATE ... WHERE col IS NULL`` depend on actual data +and produce no useful output in offline mode. Without an ``is_offline_mode()`` guard, +the generated script includes DML that may fail or silently do nothing when applied +later. + +**Example (bad):** + +.. code-block:: python + + def upgrade(): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + +**Use instead:** + +.. code-block:: python + + def upgrade(): + if not context.is_offline_mode(): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + + +Known Limitation -- Future MIG004 Candidate +--------------------------------------------- + +Migrations that perform DML and use ``batch_alter_table`` but never call +``disable_sqlite_fkeys`` at all are NOT detected by MIG001/MIG002. These migrations +may silently break SQLite foreign key handling. Detecting this pattern requires +understanding whether foreign key relationships exist on the affected tables, which +is beyond AST-level analysis. This is tracked as a future MIG004 candidate. + + +Suppression +=========== + +Add ``# noqa: MIG0XX`` to a line to suppress the corresponding check for that line. +Multiple codes can be comma-separated: ``# noqa: MIG001, MIG003``. + +Include a brief reason after ``--``:: + + # noqa: MIG003 -- simple UPDATE safe in offline mode +""" + +from __future__ import annotations + +import ast +import re +import sys +from pathlib import Path + +from rich.console import Console + +console = Console(color_system="standard", width=200) + +DML_KEYWORDS = ("UPDATE ", "INSERT ", "DELETE ") + + +def _get_noqa_codes(line: str) -> set[str]: + """Extract noqa codes from a source line.""" + match = re.search(r"#\s*noqa:\s*([\w,\s]+)", line) + if match: + return {code.strip() for code in match.group(1).split(",")} + return set() + + +def _is_dml_string(node: ast.expr) -> bool: + """Check if an AST expression is a string literal starting with a DML keyword.""" + if isinstance(node, ast.Constant) and isinstance(node.value, str): + stripped = node.value.strip().upper() + return any(stripped.startswith(kw) for kw in DML_KEYWORDS) + # f-strings: check the first string fragment + if isinstance(node, ast.JoinedStr): + for value in node.values: + if isinstance(value, ast.Constant) and isinstance(value.value, str): + stripped = value.value.strip().upper() + if any(stripped.startswith(kw) for kw in DML_KEYWORDS): + return True + break # only the leading fragment matters + return False + + +def _is_op_execute_with_dml(node: ast.Call) -> bool: + """Check if a Call node is ``op.execute(<DML string>)``.""" + if not ( + isinstance(node.func, ast.Attribute) + and isinstance(node.func.value, ast.Name) + and node.func.value.id == "op" + and node.func.attr == "execute" + ): + return False + if not node.args: + return False + arg = node.args[0] + # Direct string: op.execute("UPDATE ...") + if _is_dml_string(arg): + return True + # Wrapped: op.execute(text("UPDATE ...")) or op.execute(sa.text("UPDATE ...")) + if isinstance(arg, ast.Call) and arg.args: + return _is_dml_string(arg.args[0]) + return False + + +def _is_op_call(node: ast.Call) -> bool: + """Check if a Call node is ``op.<something>(...)``.""" + return ( + isinstance(node.func, ast.Attribute) + and isinstance(node.func.value, ast.Name) + and node.func.value.id == "op" + ) + + +def _find_disable_sqlite_fkeys_line(func_node: ast.FunctionDef) -> int | None: + """Find the line number of the first ``with disable_sqlite_fkeys(op):`` in a function.""" + # Collect all matches then take min() — ast.walk BFS order does not guarantee + # line-number order when guards appear at different nesting depths. + found: list[int] = [] + for node in ast.walk(func_node): + if not isinstance(node, ast.With): + continue + for item in node.items: + ctx = item.context_expr + if isinstance(ctx, ast.Call): + func = ctx.func + if (isinstance(func, ast.Name) and func.id == "disable_sqlite_fkeys") or ( + isinstance(func, ast.Attribute) and func.attr == "disable_sqlite_fkeys" + ): + found.append(node.lineno) + return min(found) if found else None Review Comment: **Dismiss.** Valid edge case in theory, but all existing Airflow migrations use `with disable_sqlite_fkeys(op):` as a standalone `with` statement — none combine it with `op.batch_alter_table(...)` in a multi-context-manager `with` on the same line. Verified with: `grep -n 'with.*disable_sqlite_fkeys.*batch_alter_table' airflow-core/src/airflow/migrations/versions/*.py` → zero matches. The lineno comparison is sufficient for actual codebase patterns. ########## scripts/ci/prek/check_migration_patterns.py: ########## @@ -0,0 +1,383 @@ +#!/usr/bin/env python +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# /// script +# requires-python = ">=3.10,<3.11" +# dependencies = [ +# "rich>=13.6.0", +# ] +# /// +""" +Static analysis checks for Alembic migration anti-patterns. + +These checks use AST analysis to detect common migration mistakes that can cause +silent failures, particularly on SQLite where ``PRAGMA foreign_keys`` has specific +requirements about transaction state. + +Rules +===== + +MIG001 -- DML before ``disable_sqlite_fkeys`` +---------------------------------------------- + +**What it does:** +Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, DELETE) that +appear before a ``with disable_sqlite_fkeys(op):`` block in the same function. + +**Why is this bad:** +On SQLite, any DML statement triggers *autobegin*, starting a transaction. Once a +transaction is active, ``PRAGMA foreign_keys=off`` (issued by ``disable_sqlite_fkeys``) +becomes a no-op. This means foreign key checks remain enabled, and +``batch_alter_table`` (which recreates tables) may fail with foreign key constraint +violations. + +**Example (bad):** + +.. code-block:: python + + def upgrade(): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") # triggers autobegin + with disable_sqlite_fkeys(op): # PRAGMA is now a no-op! + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + +**Use instead:** + +.. code-block:: python + + def upgrade(): + with disable_sqlite_fkeys(op): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + + +MIG002 -- DDL before ``disable_sqlite_fkeys`` +---------------------------------------------- + +**What it does:** +Detects any Alembic ``op.*`` call (DDL operations like ``op.add_column``, +``op.drop_column``, ``op.create_table``, ``op.drop_table``, ``op.create_index``, +``op.batch_alter_table``, etc.) that appears before a +``with disable_sqlite_fkeys(op):`` block in the same function. Excludes DML calls +covered by MIG001. + +**Why is this bad:** +DDL operations also trigger *autobegin* on SQLite, making the subsequent +``PRAGMA foreign_keys=off`` a no-op, exactly like DML. + +**Example (bad):** + +.. code-block:: python + + def upgrade(): + op.add_column("dag_run", sa.Column("created_at", ...)) # triggers autobegin + with disable_sqlite_fkeys(op): # PRAGMA is now a no-op! + with op.batch_alter_table("backfill") as batch_op: + batch_op.alter_column("col", nullable=True) + +**Use instead:** + +.. code-block:: python + + def upgrade(): + with disable_sqlite_fkeys(op): + op.add_column("dag_run", sa.Column("created_at", ...)) + with op.batch_alter_table("backfill") as batch_op: + batch_op.alter_column("col", nullable=True) + + +MIG003 -- DML without offline-mode guard +------------------------------------------ + +**What it does:** +Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, DELETE) in +``upgrade()`` or ``downgrade()`` functions that do not contain a +``context.is_offline_mode()`` check anywhere in the same function. + +**Why is this bad:** +Alembic's offline mode generates SQL scripts instead of executing against a live +database. DML statements like ``UPDATE ... WHERE col IS NULL`` depend on actual data +and produce no useful output in offline mode. Without an ``is_offline_mode()`` guard, +the generated script includes DML that may fail or silently do nothing when applied +later. + +**Example (bad):** + +.. code-block:: python + + def upgrade(): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + +**Use instead:** + +.. code-block:: python + + def upgrade(): + if not context.is_offline_mode(): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + + +Known Limitation -- Future MIG004 Candidate +--------------------------------------------- + +Migrations that perform DML and use ``batch_alter_table`` but never call +``disable_sqlite_fkeys`` at all are NOT detected by MIG001/MIG002. These migrations +may silently break SQLite foreign key handling. Detecting this pattern requires +understanding whether foreign key relationships exist on the affected tables, which +is beyond AST-level analysis. This is tracked as a future MIG004 candidate. + + +Suppression +=========== + +Add ``# noqa: MIG0XX`` to a line to suppress the corresponding check for that line. +Multiple codes can be comma-separated: ``# noqa: MIG001, MIG003``. + +Include a brief reason after ``--``:: + + # noqa: MIG003 -- simple UPDATE safe in offline mode +""" + +from __future__ import annotations + +import ast +import re +import sys +from pathlib import Path + +from rich.console import Console + +console = Console(color_system="standard", width=200) + +DML_KEYWORDS = ("UPDATE ", "INSERT ", "DELETE ") + + +def _get_noqa_codes(line: str) -> set[str]: + """Extract noqa codes from a source line.""" + match = re.search(r"#\s*noqa:\s*([\w,\s]+)", line) + if match: + return {code.strip() for code in match.group(1).split(",")} + return set() + + +def _is_dml_string(node: ast.expr) -> bool: + """Check if an AST expression is a string literal starting with a DML keyword.""" + if isinstance(node, ast.Constant) and isinstance(node.value, str): + stripped = node.value.strip().upper() + return any(stripped.startswith(kw) for kw in DML_KEYWORDS) + # f-strings: check the first string fragment + if isinstance(node, ast.JoinedStr): + for value in node.values: + if isinstance(value, ast.Constant) and isinstance(value.value, str): + stripped = value.value.strip().upper() + if any(stripped.startswith(kw) for kw in DML_KEYWORDS): + return True + break # only the leading fragment matters + return False + + +def _is_op_execute_with_dml(node: ast.Call) -> bool: + """Check if a Call node is ``op.execute(<DML string>)``.""" + if not ( + isinstance(node.func, ast.Attribute) + and isinstance(node.func.value, ast.Name) + and node.func.value.id == "op" + and node.func.attr == "execute" + ): + return False + if not node.args: + return False + arg = node.args[0] + # Direct string: op.execute("UPDATE ...") + if _is_dml_string(arg): + return True + # Wrapped: op.execute(text("UPDATE ...")) or op.execute(sa.text("UPDATE ...")) + if isinstance(arg, ast.Call) and arg.args: + return _is_dml_string(arg.args[0]) + return False + + +def _is_op_call(node: ast.Call) -> bool: + """Check if a Call node is ``op.<something>(...)``.""" + return ( + isinstance(node.func, ast.Attribute) + and isinstance(node.func.value, ast.Name) + and node.func.value.id == "op" + ) + + +def _find_disable_sqlite_fkeys_line(func_node: ast.FunctionDef) -> int | None: + """Find the line number of the first ``with disable_sqlite_fkeys(op):`` in a function.""" + # Collect all matches then take min() — ast.walk BFS order does not guarantee + # line-number order when guards appear at different nesting depths. + found: list[int] = [] + for node in ast.walk(func_node): + if not isinstance(node, ast.With): + continue + for item in node.items: + ctx = item.context_expr + if isinstance(ctx, ast.Call): + func = ctx.func + if (isinstance(func, ast.Name) and func.id == "disable_sqlite_fkeys") or ( + isinstance(func, ast.Attribute) and func.attr == "disable_sqlite_fkeys" + ): + found.append(node.lineno) + return min(found) if found else None + + +def _has_offline_mode_check(func_node: ast.FunctionDef) -> bool: + """Check if function contains ``context.is_offline_mode()`` call.""" + for node in ast.walk(func_node): + if isinstance(node, ast.Call): + func = node.func + if isinstance(func, ast.Attribute) and func.attr == "is_offline_mode": + return True + return False Review Comment: **Dismiss.** Any `.is_offline_mode()` call in a migration file is effectively `context.is_offline_mode()` — no other object in Alembic or Airflow migrations exposes that method name. Tightening to check the receiver name would add complexity without practical benefit, and could miss valid aliases (e.g., `ctx = context; ctx.is_offline_mode()`). Verified with: `grep -rn 'is_offline_mode' airflow-core/src/airflow/migrations/versions/*.py` → all hits are `context.is_offline_mode()`. ########## scripts/ci/prek/check_migration_patterns.py: ########## @@ -0,0 +1,383 @@ +#!/usr/bin/env python +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# /// script +# requires-python = ">=3.10,<3.11" +# dependencies = [ +# "rich>=13.6.0", +# ] +# /// +""" +Static analysis checks for Alembic migration anti-patterns. + +These checks use AST analysis to detect common migration mistakes that can cause +silent failures, particularly on SQLite where ``PRAGMA foreign_keys`` has specific +requirements about transaction state. + +Rules +===== + +MIG001 -- DML before ``disable_sqlite_fkeys`` +---------------------------------------------- + +**What it does:** +Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, DELETE) that +appear before a ``with disable_sqlite_fkeys(op):`` block in the same function. + +**Why is this bad:** +On SQLite, any DML statement triggers *autobegin*, starting a transaction. Once a +transaction is active, ``PRAGMA foreign_keys=off`` (issued by ``disable_sqlite_fkeys``) +becomes a no-op. This means foreign key checks remain enabled, and +``batch_alter_table`` (which recreates tables) may fail with foreign key constraint +violations. + +**Example (bad):** + +.. code-block:: python + + def upgrade(): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") # triggers autobegin + with disable_sqlite_fkeys(op): # PRAGMA is now a no-op! + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + +**Use instead:** + +.. code-block:: python + + def upgrade(): + with disable_sqlite_fkeys(op): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + + +MIG002 -- DDL before ``disable_sqlite_fkeys`` +---------------------------------------------- + +**What it does:** +Detects any Alembic ``op.*`` call (DDL operations like ``op.add_column``, +``op.drop_column``, ``op.create_table``, ``op.drop_table``, ``op.create_index``, +``op.batch_alter_table``, etc.) that appears before a +``with disable_sqlite_fkeys(op):`` block in the same function. Excludes DML calls +covered by MIG001. + +**Why is this bad:** +DDL operations also trigger *autobegin* on SQLite, making the subsequent +``PRAGMA foreign_keys=off`` a no-op, exactly like DML. + +**Example (bad):** + +.. code-block:: python + + def upgrade(): + op.add_column("dag_run", sa.Column("created_at", ...)) # triggers autobegin + with disable_sqlite_fkeys(op): # PRAGMA is now a no-op! + with op.batch_alter_table("backfill") as batch_op: + batch_op.alter_column("col", nullable=True) + +**Use instead:** + +.. code-block:: python + + def upgrade(): + with disable_sqlite_fkeys(op): + op.add_column("dag_run", sa.Column("created_at", ...)) + with op.batch_alter_table("backfill") as batch_op: + batch_op.alter_column("col", nullable=True) + + +MIG003 -- DML without offline-mode guard +------------------------------------------ + +**What it does:** +Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, DELETE) in +``upgrade()`` or ``downgrade()`` functions that do not contain a +``context.is_offline_mode()`` check anywhere in the same function. + +**Why is this bad:** +Alembic's offline mode generates SQL scripts instead of executing against a live +database. DML statements like ``UPDATE ... WHERE col IS NULL`` depend on actual data +and produce no useful output in offline mode. Without an ``is_offline_mode()`` guard, +the generated script includes DML that may fail or silently do nothing when applied +later. + +**Example (bad):** + +.. code-block:: python + + def upgrade(): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + +**Use instead:** + +.. code-block:: python + + def upgrade(): + if not context.is_offline_mode(): + op.execute("UPDATE dag SET col = '' WHERE col IS NULL") + with op.batch_alter_table("dag") as batch_op: + batch_op.alter_column("col", nullable=False) + + +Known Limitation -- Future MIG004 Candidate +--------------------------------------------- + +Migrations that perform DML and use ``batch_alter_table`` but never call +``disable_sqlite_fkeys`` at all are NOT detected by MIG001/MIG002. These migrations +may silently break SQLite foreign key handling. Detecting this pattern requires +understanding whether foreign key relationships exist on the affected tables, which +is beyond AST-level analysis. This is tracked as a future MIG004 candidate. + + +Suppression +=========== + +Add ``# noqa: MIG0XX`` to a line to suppress the corresponding check for that line. +Multiple codes can be comma-separated: ``# noqa: MIG001, MIG003``. + +Include a brief reason after ``--``:: + + # noqa: MIG003 -- simple UPDATE safe in offline mode +""" + +from __future__ import annotations + +import ast +import re +import sys +from pathlib import Path + +from rich.console import Console + +console = Console(color_system="standard", width=200) + +DML_KEYWORDS = ("UPDATE ", "INSERT ", "DELETE ") + + +def _get_noqa_codes(line: str) -> set[str]: + """Extract noqa codes from a source line.""" + match = re.search(r"#\s*noqa:\s*([\w,\s]+)", line) + if match: + return {code.strip() for code in match.group(1).split(",")} + return set() + + +def _is_dml_string(node: ast.expr) -> bool: + """Check if an AST expression is a string literal starting with a DML keyword.""" + if isinstance(node, ast.Constant) and isinstance(node.value, str): + stripped = node.value.strip().upper() + return any(stripped.startswith(kw) for kw in DML_KEYWORDS) Review Comment: **Accept.** Good catch. Fixed in 877a4a27 — `DML_KEYWORDS` now uses bare keywords (`"UPDATE"`, `"INSERT"`, `"DELETE"`) and `_is_dml_string` uses `split(None, 1)` to check the first word, handling any whitespace separator (space, newline, tab). Verified that `DELETE\nFROM`, `UPDATE\n SET`, `INSERT\nINTO` are all now detected, and existing space-separated DML detection is preserved. All 34 existing tests pass. ########## scripts/ci/prek/check_migration_patterns.py: ########## @@ -0,0 +1,383 @@ +#!/usr/bin/env python +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# /// script +# requires-python = ">=3.10,<3.11" +# dependencies = [ +# "rich>=13.6.0", +# ] Review Comment: **Dismiss.** The PEP 723 `requires-python` is metadata for `uv run --script` mode — it's not used in the actual prek/pre-commit invocation path. The script is called by prek hooks which manage their own Python environment (`language: python` in `.pre-commit-config.yaml` uses whatever Python is available). The constraint is informational only and has no effect on which Python versions can run this script in practice. ########## .pre-commit-config.yaml: ########## @@ -1034,6 +1034,14 @@ repos: pass_filenames: false files: ^.*\.py$ require_serial: true + - id: check-migration-patterns + name: Check migration files for anti-patterns (MIG001/MIG002/MIG003) + entry: ./scripts/ci/prek/check_migration_patterns.py + language: python + pass_filenames: true + files: ^airflow-core/src/airflow/migrations/versions/.*\.py$ + additional_dependencies: [rich>=13.6.0] + require_serial: true Review Comment: **Dismiss.** This follows from the premise in the previous comment about PEP 723 constraints. Since prek hooks manage their own Python environment independently of PEP 723 metadata, no `language_version` pin is needed in `.pre-commit-config.yaml`. The script runs correctly on all supported Python versions (3.10+) — verified by CI passing on Python 3.10, 3.11, 3.12, 3.13, and 3.14. ########## scripts/ci/prek/check_new_airflow_exception_usage.py: ########## @@ -59,7 +59,7 @@ from rich.console import Console from rich.panel import Panel -console = Console() +console = Console(color_system="standard", width=200) Review Comment: **Partial.** This `Console(color_system="standard", width=200)` change was requested by human reviewer Lee-W ([comment](https://github.com/apache/airflow/pull/64972#discussion_r3062503519)) for the other prek script in this PR. Applied consistently to both scripts in the same commit to keep output formatting uniform across prek checks. Minor scope inclusion but intentional. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
