Dev-iL commented on code in PR #64972:
URL: https://github.com/apache/airflow/pull/64972#discussion_r3069066644


##########
scripts/ci/prek/check_migration_patterns.py:
##########
@@ -0,0 +1,383 @@
+#!/usr/bin/env python
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# /// script
+# requires-python = ">=3.10,<3.11"
+# dependencies = [
+#   "rich>=13.6.0",
+# ]
+# ///
+"""
+Static analysis checks for Alembic migration anti-patterns.
+
+These checks use AST analysis to detect common migration mistakes that can 
cause
+silent failures, particularly on SQLite where ``PRAGMA foreign_keys`` has 
specific
+requirements about transaction state.
+
+Rules
+=====
+
+MIG001 -- DML before ``disable_sqlite_fkeys``
+----------------------------------------------
+
+**What it does:**
+Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, 
DELETE) that
+appear before a ``with disable_sqlite_fkeys(op):`` block in the same function.
+
+**Why is this bad:**
+On SQLite, any DML statement triggers *autobegin*, starting a transaction.  
Once a
+transaction is active, ``PRAGMA foreign_keys=off`` (issued by 
``disable_sqlite_fkeys``)
+becomes a no-op.  This means foreign key checks remain enabled, and
+``batch_alter_table`` (which recreates tables) may fail with foreign key 
constraint
+violations.
+
+**Example (bad):**
+
+.. code-block:: python
+
+    def upgrade():
+        op.execute("UPDATE dag SET col = '' WHERE col IS NULL")  # triggers 
autobegin
+        with disable_sqlite_fkeys(op):  # PRAGMA is now a no-op!
+            with op.batch_alter_table("dag") as batch_op:
+                batch_op.alter_column("col", nullable=False)
+
+**Use instead:**
+
+.. code-block:: python
+
+    def upgrade():
+        with disable_sqlite_fkeys(op):
+            op.execute("UPDATE dag SET col = '' WHERE col IS NULL")
+            with op.batch_alter_table("dag") as batch_op:
+                batch_op.alter_column("col", nullable=False)
+
+
+MIG002 -- DDL before ``disable_sqlite_fkeys``
+----------------------------------------------
+
+**What it does:**
+Detects any Alembic ``op.*`` call (DDL operations like ``op.add_column``,
+``op.drop_column``, ``op.create_table``, ``op.drop_table``, 
``op.create_index``,
+``op.batch_alter_table``, etc.) that appears before a
+``with disable_sqlite_fkeys(op):`` block in the same function.  Excludes DML 
calls
+covered by MIG001.
+
+**Why is this bad:**
+DDL operations also trigger *autobegin* on SQLite, making the subsequent
+``PRAGMA foreign_keys=off`` a no-op, exactly like DML.
+
+**Example (bad):**
+
+.. code-block:: python
+
+    def upgrade():
+        op.add_column("dag_run", sa.Column("created_at", ...))  # triggers 
autobegin
+        with disable_sqlite_fkeys(op):  # PRAGMA is now a no-op!
+            with op.batch_alter_table("backfill") as batch_op:
+                batch_op.alter_column("col", nullable=True)
+
+**Use instead:**
+
+.. code-block:: python
+
+    def upgrade():
+        with disable_sqlite_fkeys(op):
+            op.add_column("dag_run", sa.Column("created_at", ...))
+            with op.batch_alter_table("backfill") as batch_op:
+                batch_op.alter_column("col", nullable=True)
+
+
+MIG003 -- DML without offline-mode guard
+------------------------------------------
+
+**What it does:**
+Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, 
DELETE) in
+``upgrade()`` or ``downgrade()`` functions that do not contain a
+``context.is_offline_mode()`` check anywhere in the same function.
+
+**Why is this bad:**
+Alembic's offline mode generates SQL scripts instead of executing against a 
live
+database.  DML statements like ``UPDATE ... WHERE col IS NULL`` depend on 
actual data
+and produce no useful output in offline mode.  Without an 
``is_offline_mode()`` guard,
+the generated script includes DML that may fail or silently do nothing when 
applied
+later.
+
+**Example (bad):**
+
+.. code-block:: python
+
+    def upgrade():
+        op.execute("UPDATE dag SET col = '' WHERE col IS NULL")
+        with op.batch_alter_table("dag") as batch_op:
+            batch_op.alter_column("col", nullable=False)
+
+**Use instead:**
+
+.. code-block:: python
+
+    def upgrade():
+        if not context.is_offline_mode():
+            op.execute("UPDATE dag SET col = '' WHERE col IS NULL")
+        with op.batch_alter_table("dag") as batch_op:
+            batch_op.alter_column("col", nullable=False)
+
+
+Known Limitation -- Future MIG004 Candidate
+---------------------------------------------
+
+Migrations that perform DML and use ``batch_alter_table`` but never call
+``disable_sqlite_fkeys`` at all are NOT detected by MIG001/MIG002.  These 
migrations
+may silently break SQLite foreign key handling.  Detecting this pattern 
requires
+understanding whether foreign key relationships exist on the affected tables, 
which
+is beyond AST-level analysis.  This is tracked as a future MIG004 candidate.
+
+
+Suppression
+===========
+
+Add ``# noqa: MIG0XX`` to a line to suppress the corresponding check for that 
line.
+Multiple codes can be comma-separated: ``# noqa: MIG001, MIG003``.
+
+Include a brief reason after ``--``::
+
+    # noqa: MIG003 -- simple UPDATE safe in offline mode
+"""
+
+from __future__ import annotations
+
+import ast
+import re
+import sys
+from pathlib import Path
+
+from rich.console import Console
+
+console = Console(color_system="standard", width=200)
+
+DML_KEYWORDS = ("UPDATE ", "INSERT ", "DELETE ")
+
+
+def _get_noqa_codes(line: str) -> set[str]:
+    """Extract noqa codes from a source line."""
+    match = re.search(r"#\s*noqa:\s*([\w,\s]+)", line)
+    if match:
+        return {code.strip() for code in match.group(1).split(",")}
+    return set()
+
+
+def _is_dml_string(node: ast.expr) -> bool:
+    """Check if an AST expression is a string literal starting with a DML 
keyword."""
+    if isinstance(node, ast.Constant) and isinstance(node.value, str):
+        stripped = node.value.strip().upper()
+        return any(stripped.startswith(kw) for kw in DML_KEYWORDS)
+    # f-strings: check the first string fragment
+    if isinstance(node, ast.JoinedStr):
+        for value in node.values:
+            if isinstance(value, ast.Constant) and isinstance(value.value, 
str):
+                stripped = value.value.strip().upper()
+                if any(stripped.startswith(kw) for kw in DML_KEYWORDS):
+                    return True
+                break  # only the leading fragment matters
+    return False
+
+
+def _is_op_execute_with_dml(node: ast.Call) -> bool:
+    """Check if a Call node is ``op.execute(<DML string>)``."""
+    if not (
+        isinstance(node.func, ast.Attribute)
+        and isinstance(node.func.value, ast.Name)
+        and node.func.value.id == "op"
+        and node.func.attr == "execute"
+    ):
+        return False
+    if not node.args:
+        return False
+    arg = node.args[0]
+    # Direct string: op.execute("UPDATE ...")
+    if _is_dml_string(arg):
+        return True
+    # Wrapped: op.execute(text("UPDATE ...")) or op.execute(sa.text("UPDATE 
..."))
+    if isinstance(arg, ast.Call) and arg.args:
+        return _is_dml_string(arg.args[0])
+    return False
+
+
+def _is_op_call(node: ast.Call) -> bool:
+    """Check if a Call node is ``op.<something>(...)``."""
+    return (
+        isinstance(node.func, ast.Attribute)
+        and isinstance(node.func.value, ast.Name)
+        and node.func.value.id == "op"
+    )

Review Comment:
   **Dismiss.** Investigated — zero migration files in the current codebase use 
non-DDL `op.*` calls (like `op.get_bind()`, `op.get_context()`) before 
`disable_sqlite_fkeys`. Verified with: `grep -r 'op\.get_bind\|op\.get_context' 
airflow-core/src/airflow/migrations/versions/` → no hits in any function that 
also calls `disable_sqlite_fkeys`. If a future migration introduces such a 
pattern, the false positive would be a safe-side alert rather than a missed 
violation.



##########
scripts/ci/prek/check_migration_patterns.py:
##########
@@ -0,0 +1,383 @@
+#!/usr/bin/env python
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# /// script
+# requires-python = ">=3.10,<3.11"
+# dependencies = [
+#   "rich>=13.6.0",
+# ]
+# ///
+"""
+Static analysis checks for Alembic migration anti-patterns.
+
+These checks use AST analysis to detect common migration mistakes that can 
cause
+silent failures, particularly on SQLite where ``PRAGMA foreign_keys`` has 
specific
+requirements about transaction state.
+
+Rules
+=====
+
+MIG001 -- DML before ``disable_sqlite_fkeys``
+----------------------------------------------
+
+**What it does:**
+Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, 
DELETE) that
+appear before a ``with disable_sqlite_fkeys(op):`` block in the same function.
+
+**Why is this bad:**
+On SQLite, any DML statement triggers *autobegin*, starting a transaction.  
Once a
+transaction is active, ``PRAGMA foreign_keys=off`` (issued by 
``disable_sqlite_fkeys``)
+becomes a no-op.  This means foreign key checks remain enabled, and
+``batch_alter_table`` (which recreates tables) may fail with foreign key 
constraint
+violations.
+
+**Example (bad):**
+
+.. code-block:: python
+
+    def upgrade():
+        op.execute("UPDATE dag SET col = '' WHERE col IS NULL")  # triggers 
autobegin
+        with disable_sqlite_fkeys(op):  # PRAGMA is now a no-op!
+            with op.batch_alter_table("dag") as batch_op:
+                batch_op.alter_column("col", nullable=False)
+
+**Use instead:**
+
+.. code-block:: python
+
+    def upgrade():
+        with disable_sqlite_fkeys(op):
+            op.execute("UPDATE dag SET col = '' WHERE col IS NULL")
+            with op.batch_alter_table("dag") as batch_op:
+                batch_op.alter_column("col", nullable=False)
+
+
+MIG002 -- DDL before ``disable_sqlite_fkeys``
+----------------------------------------------
+
+**What it does:**
+Detects any Alembic ``op.*`` call (DDL operations like ``op.add_column``,
+``op.drop_column``, ``op.create_table``, ``op.drop_table``, 
``op.create_index``,
+``op.batch_alter_table``, etc.) that appears before a
+``with disable_sqlite_fkeys(op):`` block in the same function.  Excludes DML 
calls
+covered by MIG001.
+
+**Why is this bad:**
+DDL operations also trigger *autobegin* on SQLite, making the subsequent
+``PRAGMA foreign_keys=off`` a no-op, exactly like DML.
+
+**Example (bad):**
+
+.. code-block:: python
+
+    def upgrade():
+        op.add_column("dag_run", sa.Column("created_at", ...))  # triggers 
autobegin
+        with disable_sqlite_fkeys(op):  # PRAGMA is now a no-op!
+            with op.batch_alter_table("backfill") as batch_op:
+                batch_op.alter_column("col", nullable=True)
+
+**Use instead:**
+
+.. code-block:: python
+
+    def upgrade():
+        with disable_sqlite_fkeys(op):
+            op.add_column("dag_run", sa.Column("created_at", ...))
+            with op.batch_alter_table("backfill") as batch_op:
+                batch_op.alter_column("col", nullable=True)
+
+
+MIG003 -- DML without offline-mode guard
+------------------------------------------
+
+**What it does:**
+Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, 
DELETE) in
+``upgrade()`` or ``downgrade()`` functions that do not contain a
+``context.is_offline_mode()`` check anywhere in the same function.
+
+**Why is this bad:**
+Alembic's offline mode generates SQL scripts instead of executing against a 
live
+database.  DML statements like ``UPDATE ... WHERE col IS NULL`` depend on 
actual data
+and produce no useful output in offline mode.  Without an 
``is_offline_mode()`` guard,
+the generated script includes DML that may fail or silently do nothing when 
applied
+later.
+
+**Example (bad):**
+
+.. code-block:: python
+
+    def upgrade():
+        op.execute("UPDATE dag SET col = '' WHERE col IS NULL")
+        with op.batch_alter_table("dag") as batch_op:
+            batch_op.alter_column("col", nullable=False)
+
+**Use instead:**
+
+.. code-block:: python
+
+    def upgrade():
+        if not context.is_offline_mode():
+            op.execute("UPDATE dag SET col = '' WHERE col IS NULL")
+        with op.batch_alter_table("dag") as batch_op:
+            batch_op.alter_column("col", nullable=False)
+
+
+Known Limitation -- Future MIG004 Candidate
+---------------------------------------------
+
+Migrations that perform DML and use ``batch_alter_table`` but never call
+``disable_sqlite_fkeys`` at all are NOT detected by MIG001/MIG002.  These 
migrations
+may silently break SQLite foreign key handling.  Detecting this pattern 
requires
+understanding whether foreign key relationships exist on the affected tables, 
which
+is beyond AST-level analysis.  This is tracked as a future MIG004 candidate.
+
+
+Suppression
+===========
+
+Add ``# noqa: MIG0XX`` to a line to suppress the corresponding check for that 
line.
+Multiple codes can be comma-separated: ``# noqa: MIG001, MIG003``.
+
+Include a brief reason after ``--``::
+
+    # noqa: MIG003 -- simple UPDATE safe in offline mode
+"""
+
+from __future__ import annotations
+
+import ast
+import re
+import sys
+from pathlib import Path
+
+from rich.console import Console
+
+console = Console(color_system="standard", width=200)
+
+DML_KEYWORDS = ("UPDATE ", "INSERT ", "DELETE ")
+
+
+def _get_noqa_codes(line: str) -> set[str]:
+    """Extract noqa codes from a source line."""
+    match = re.search(r"#\s*noqa:\s*([\w,\s]+)", line)
+    if match:
+        return {code.strip() for code in match.group(1).split(",")}
+    return set()
+
+
+def _is_dml_string(node: ast.expr) -> bool:
+    """Check if an AST expression is a string literal starting with a DML 
keyword."""
+    if isinstance(node, ast.Constant) and isinstance(node.value, str):
+        stripped = node.value.strip().upper()
+        return any(stripped.startswith(kw) for kw in DML_KEYWORDS)
+    # f-strings: check the first string fragment
+    if isinstance(node, ast.JoinedStr):
+        for value in node.values:
+            if isinstance(value, ast.Constant) and isinstance(value.value, 
str):
+                stripped = value.value.strip().upper()
+                if any(stripped.startswith(kw) for kw in DML_KEYWORDS):
+                    return True
+                break  # only the leading fragment matters
+    return False
+
+
+def _is_op_execute_with_dml(node: ast.Call) -> bool:
+    """Check if a Call node is ``op.execute(<DML string>)``."""
+    if not (
+        isinstance(node.func, ast.Attribute)
+        and isinstance(node.func.value, ast.Name)
+        and node.func.value.id == "op"
+        and node.func.attr == "execute"
+    ):
+        return False
+    if not node.args:
+        return False
+    arg = node.args[0]
+    # Direct string: op.execute("UPDATE ...")
+    if _is_dml_string(arg):
+        return True
+    # Wrapped: op.execute(text("UPDATE ...")) or op.execute(sa.text("UPDATE 
..."))
+    if isinstance(arg, ast.Call) and arg.args:
+        return _is_dml_string(arg.args[0])
+    return False
+
+
+def _is_op_call(node: ast.Call) -> bool:
+    """Check if a Call node is ``op.<something>(...)``."""
+    return (
+        isinstance(node.func, ast.Attribute)
+        and isinstance(node.func.value, ast.Name)
+        and node.func.value.id == "op"
+    )
+
+
+def _find_disable_sqlite_fkeys_line(func_node: ast.FunctionDef) -> int | None:
+    """Find the line number of the first ``with disable_sqlite_fkeys(op):`` in 
a function."""
+    # Collect all matches then take min() — ast.walk BFS order does not 
guarantee
+    # line-number order when guards appear at different nesting depths.
+    found: list[int] = []
+    for node in ast.walk(func_node):
+        if not isinstance(node, ast.With):
+            continue
+        for item in node.items:
+            ctx = item.context_expr
+            if isinstance(ctx, ast.Call):
+                func = ctx.func
+                if (isinstance(func, ast.Name) and func.id == 
"disable_sqlite_fkeys") or (
+                    isinstance(func, ast.Attribute) and func.attr == 
"disable_sqlite_fkeys"
+                ):
+                    found.append(node.lineno)
+    return min(found) if found else None

Review Comment:
   **Dismiss.** Valid edge case in theory, but all existing Airflow migrations 
use `with disable_sqlite_fkeys(op):` as a standalone `with` statement — none 
combine it with `op.batch_alter_table(...)` in a multi-context-manager `with` 
on the same line. Verified with: `grep -n 
'with.*disable_sqlite_fkeys.*batch_alter_table' 
airflow-core/src/airflow/migrations/versions/*.py` → zero matches. The lineno 
comparison is sufficient for actual codebase patterns.



##########
scripts/ci/prek/check_migration_patterns.py:
##########
@@ -0,0 +1,383 @@
+#!/usr/bin/env python
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# /// script
+# requires-python = ">=3.10,<3.11"
+# dependencies = [
+#   "rich>=13.6.0",
+# ]
+# ///
+"""
+Static analysis checks for Alembic migration anti-patterns.
+
+These checks use AST analysis to detect common migration mistakes that can 
cause
+silent failures, particularly on SQLite where ``PRAGMA foreign_keys`` has 
specific
+requirements about transaction state.
+
+Rules
+=====
+
+MIG001 -- DML before ``disable_sqlite_fkeys``
+----------------------------------------------
+
+**What it does:**
+Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, 
DELETE) that
+appear before a ``with disable_sqlite_fkeys(op):`` block in the same function.
+
+**Why is this bad:**
+On SQLite, any DML statement triggers *autobegin*, starting a transaction.  
Once a
+transaction is active, ``PRAGMA foreign_keys=off`` (issued by 
``disable_sqlite_fkeys``)
+becomes a no-op.  This means foreign key checks remain enabled, and
+``batch_alter_table`` (which recreates tables) may fail with foreign key 
constraint
+violations.
+
+**Example (bad):**
+
+.. code-block:: python
+
+    def upgrade():
+        op.execute("UPDATE dag SET col = '' WHERE col IS NULL")  # triggers 
autobegin
+        with disable_sqlite_fkeys(op):  # PRAGMA is now a no-op!
+            with op.batch_alter_table("dag") as batch_op:
+                batch_op.alter_column("col", nullable=False)
+
+**Use instead:**
+
+.. code-block:: python
+
+    def upgrade():
+        with disable_sqlite_fkeys(op):
+            op.execute("UPDATE dag SET col = '' WHERE col IS NULL")
+            with op.batch_alter_table("dag") as batch_op:
+                batch_op.alter_column("col", nullable=False)
+
+
+MIG002 -- DDL before ``disable_sqlite_fkeys``
+----------------------------------------------
+
+**What it does:**
+Detects any Alembic ``op.*`` call (DDL operations like ``op.add_column``,
+``op.drop_column``, ``op.create_table``, ``op.drop_table``, 
``op.create_index``,
+``op.batch_alter_table``, etc.) that appears before a
+``with disable_sqlite_fkeys(op):`` block in the same function.  Excludes DML 
calls
+covered by MIG001.
+
+**Why is this bad:**
+DDL operations also trigger *autobegin* on SQLite, making the subsequent
+``PRAGMA foreign_keys=off`` a no-op, exactly like DML.
+
+**Example (bad):**
+
+.. code-block:: python
+
+    def upgrade():
+        op.add_column("dag_run", sa.Column("created_at", ...))  # triggers 
autobegin
+        with disable_sqlite_fkeys(op):  # PRAGMA is now a no-op!
+            with op.batch_alter_table("backfill") as batch_op:
+                batch_op.alter_column("col", nullable=True)
+
+**Use instead:**
+
+.. code-block:: python
+
+    def upgrade():
+        with disable_sqlite_fkeys(op):
+            op.add_column("dag_run", sa.Column("created_at", ...))
+            with op.batch_alter_table("backfill") as batch_op:
+                batch_op.alter_column("col", nullable=True)
+
+
+MIG003 -- DML without offline-mode guard
+------------------------------------------
+
+**What it does:**
+Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, 
DELETE) in
+``upgrade()`` or ``downgrade()`` functions that do not contain a
+``context.is_offline_mode()`` check anywhere in the same function.
+
+**Why is this bad:**
+Alembic's offline mode generates SQL scripts instead of executing against a 
live
+database.  DML statements like ``UPDATE ... WHERE col IS NULL`` depend on 
actual data
+and produce no useful output in offline mode.  Without an 
``is_offline_mode()`` guard,
+the generated script includes DML that may fail or silently do nothing when 
applied
+later.
+
+**Example (bad):**
+
+.. code-block:: python
+
+    def upgrade():
+        op.execute("UPDATE dag SET col = '' WHERE col IS NULL")
+        with op.batch_alter_table("dag") as batch_op:
+            batch_op.alter_column("col", nullable=False)
+
+**Use instead:**
+
+.. code-block:: python
+
+    def upgrade():
+        if not context.is_offline_mode():
+            op.execute("UPDATE dag SET col = '' WHERE col IS NULL")
+        with op.batch_alter_table("dag") as batch_op:
+            batch_op.alter_column("col", nullable=False)
+
+
+Known Limitation -- Future MIG004 Candidate
+---------------------------------------------
+
+Migrations that perform DML and use ``batch_alter_table`` but never call
+``disable_sqlite_fkeys`` at all are NOT detected by MIG001/MIG002.  These 
migrations
+may silently break SQLite foreign key handling.  Detecting this pattern 
requires
+understanding whether foreign key relationships exist on the affected tables, 
which
+is beyond AST-level analysis.  This is tracked as a future MIG004 candidate.
+
+
+Suppression
+===========
+
+Add ``# noqa: MIG0XX`` to a line to suppress the corresponding check for that 
line.
+Multiple codes can be comma-separated: ``# noqa: MIG001, MIG003``.
+
+Include a brief reason after ``--``::
+
+    # noqa: MIG003 -- simple UPDATE safe in offline mode
+"""
+
+from __future__ import annotations
+
+import ast
+import re
+import sys
+from pathlib import Path
+
+from rich.console import Console
+
+console = Console(color_system="standard", width=200)
+
+DML_KEYWORDS = ("UPDATE ", "INSERT ", "DELETE ")
+
+
+def _get_noqa_codes(line: str) -> set[str]:
+    """Extract noqa codes from a source line."""
+    match = re.search(r"#\s*noqa:\s*([\w,\s]+)", line)
+    if match:
+        return {code.strip() for code in match.group(1).split(",")}
+    return set()
+
+
+def _is_dml_string(node: ast.expr) -> bool:
+    """Check if an AST expression is a string literal starting with a DML 
keyword."""
+    if isinstance(node, ast.Constant) and isinstance(node.value, str):
+        stripped = node.value.strip().upper()
+        return any(stripped.startswith(kw) for kw in DML_KEYWORDS)
+    # f-strings: check the first string fragment
+    if isinstance(node, ast.JoinedStr):
+        for value in node.values:
+            if isinstance(value, ast.Constant) and isinstance(value.value, 
str):
+                stripped = value.value.strip().upper()
+                if any(stripped.startswith(kw) for kw in DML_KEYWORDS):
+                    return True
+                break  # only the leading fragment matters
+    return False
+
+
+def _is_op_execute_with_dml(node: ast.Call) -> bool:
+    """Check if a Call node is ``op.execute(<DML string>)``."""
+    if not (
+        isinstance(node.func, ast.Attribute)
+        and isinstance(node.func.value, ast.Name)
+        and node.func.value.id == "op"
+        and node.func.attr == "execute"
+    ):
+        return False
+    if not node.args:
+        return False
+    arg = node.args[0]
+    # Direct string: op.execute("UPDATE ...")
+    if _is_dml_string(arg):
+        return True
+    # Wrapped: op.execute(text("UPDATE ...")) or op.execute(sa.text("UPDATE 
..."))
+    if isinstance(arg, ast.Call) and arg.args:
+        return _is_dml_string(arg.args[0])
+    return False
+
+
+def _is_op_call(node: ast.Call) -> bool:
+    """Check if a Call node is ``op.<something>(...)``."""
+    return (
+        isinstance(node.func, ast.Attribute)
+        and isinstance(node.func.value, ast.Name)
+        and node.func.value.id == "op"
+    )
+
+
+def _find_disable_sqlite_fkeys_line(func_node: ast.FunctionDef) -> int | None:
+    """Find the line number of the first ``with disable_sqlite_fkeys(op):`` in 
a function."""
+    # Collect all matches then take min() — ast.walk BFS order does not 
guarantee
+    # line-number order when guards appear at different nesting depths.
+    found: list[int] = []
+    for node in ast.walk(func_node):
+        if not isinstance(node, ast.With):
+            continue
+        for item in node.items:
+            ctx = item.context_expr
+            if isinstance(ctx, ast.Call):
+                func = ctx.func
+                if (isinstance(func, ast.Name) and func.id == 
"disable_sqlite_fkeys") or (
+                    isinstance(func, ast.Attribute) and func.attr == 
"disable_sqlite_fkeys"
+                ):
+                    found.append(node.lineno)
+    return min(found) if found else None
+
+
+def _has_offline_mode_check(func_node: ast.FunctionDef) -> bool:
+    """Check if function contains ``context.is_offline_mode()`` call."""
+    for node in ast.walk(func_node):
+        if isinstance(node, ast.Call):
+            func = node.func
+            if isinstance(func, ast.Attribute) and func.attr == 
"is_offline_mode":
+                return True
+    return False

Review Comment:
   **Dismiss.** Any `.is_offline_mode()` call in a migration file is 
effectively `context.is_offline_mode()` — no other object in Alembic or Airflow 
migrations exposes that method name. Tightening to check the receiver name 
would add complexity without practical benefit, and could miss valid aliases 
(e.g., `ctx = context; ctx.is_offline_mode()`). Verified with: `grep -rn 
'is_offline_mode' airflow-core/src/airflow/migrations/versions/*.py` → all hits 
are `context.is_offline_mode()`.



##########
scripts/ci/prek/check_migration_patterns.py:
##########
@@ -0,0 +1,383 @@
+#!/usr/bin/env python
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# /// script
+# requires-python = ">=3.10,<3.11"
+# dependencies = [
+#   "rich>=13.6.0",
+# ]
+# ///
+"""
+Static analysis checks for Alembic migration anti-patterns.
+
+These checks use AST analysis to detect common migration mistakes that can 
cause
+silent failures, particularly on SQLite where ``PRAGMA foreign_keys`` has 
specific
+requirements about transaction state.
+
+Rules
+=====
+
+MIG001 -- DML before ``disable_sqlite_fkeys``
+----------------------------------------------
+
+**What it does:**
+Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, 
DELETE) that
+appear before a ``with disable_sqlite_fkeys(op):`` block in the same function.
+
+**Why is this bad:**
+On SQLite, any DML statement triggers *autobegin*, starting a transaction.  
Once a
+transaction is active, ``PRAGMA foreign_keys=off`` (issued by 
``disable_sqlite_fkeys``)
+becomes a no-op.  This means foreign key checks remain enabled, and
+``batch_alter_table`` (which recreates tables) may fail with foreign key 
constraint
+violations.
+
+**Example (bad):**
+
+.. code-block:: python
+
+    def upgrade():
+        op.execute("UPDATE dag SET col = '' WHERE col IS NULL")  # triggers 
autobegin
+        with disable_sqlite_fkeys(op):  # PRAGMA is now a no-op!
+            with op.batch_alter_table("dag") as batch_op:
+                batch_op.alter_column("col", nullable=False)
+
+**Use instead:**
+
+.. code-block:: python
+
+    def upgrade():
+        with disable_sqlite_fkeys(op):
+            op.execute("UPDATE dag SET col = '' WHERE col IS NULL")
+            with op.batch_alter_table("dag") as batch_op:
+                batch_op.alter_column("col", nullable=False)
+
+
+MIG002 -- DDL before ``disable_sqlite_fkeys``
+----------------------------------------------
+
+**What it does:**
+Detects any Alembic ``op.*`` call (DDL operations like ``op.add_column``,
+``op.drop_column``, ``op.create_table``, ``op.drop_table``, 
``op.create_index``,
+``op.batch_alter_table``, etc.) that appears before a
+``with disable_sqlite_fkeys(op):`` block in the same function.  Excludes DML 
calls
+covered by MIG001.
+
+**Why is this bad:**
+DDL operations also trigger *autobegin* on SQLite, making the subsequent
+``PRAGMA foreign_keys=off`` a no-op, exactly like DML.
+
+**Example (bad):**
+
+.. code-block:: python
+
+    def upgrade():
+        op.add_column("dag_run", sa.Column("created_at", ...))  # triggers 
autobegin
+        with disable_sqlite_fkeys(op):  # PRAGMA is now a no-op!
+            with op.batch_alter_table("backfill") as batch_op:
+                batch_op.alter_column("col", nullable=True)
+
+**Use instead:**
+
+.. code-block:: python
+
+    def upgrade():
+        with disable_sqlite_fkeys(op):
+            op.add_column("dag_run", sa.Column("created_at", ...))
+            with op.batch_alter_table("backfill") as batch_op:
+                batch_op.alter_column("col", nullable=True)
+
+
+MIG003 -- DML without offline-mode guard
+------------------------------------------
+
+**What it does:**
+Detects ``op.execute()`` calls containing DML keywords (UPDATE, INSERT, 
DELETE) in
+``upgrade()`` or ``downgrade()`` functions that do not contain a
+``context.is_offline_mode()`` check anywhere in the same function.
+
+**Why is this bad:**
+Alembic's offline mode generates SQL scripts instead of executing against a 
live
+database.  DML statements like ``UPDATE ... WHERE col IS NULL`` depend on 
actual data
+and produce no useful output in offline mode.  Without an 
``is_offline_mode()`` guard,
+the generated script includes DML that may fail or silently do nothing when 
applied
+later.
+
+**Example (bad):**
+
+.. code-block:: python
+
+    def upgrade():
+        op.execute("UPDATE dag SET col = '' WHERE col IS NULL")
+        with op.batch_alter_table("dag") as batch_op:
+            batch_op.alter_column("col", nullable=False)
+
+**Use instead:**
+
+.. code-block:: python
+
+    def upgrade():
+        if not context.is_offline_mode():
+            op.execute("UPDATE dag SET col = '' WHERE col IS NULL")
+        with op.batch_alter_table("dag") as batch_op:
+            batch_op.alter_column("col", nullable=False)
+
+
+Known Limitation -- Future MIG004 Candidate
+---------------------------------------------
+
+Migrations that perform DML and use ``batch_alter_table`` but never call
+``disable_sqlite_fkeys`` at all are NOT detected by MIG001/MIG002.  These 
migrations
+may silently break SQLite foreign key handling.  Detecting this pattern 
requires
+understanding whether foreign key relationships exist on the affected tables, 
which
+is beyond AST-level analysis.  This is tracked as a future MIG004 candidate.
+
+
+Suppression
+===========
+
+Add ``# noqa: MIG0XX`` to a line to suppress the corresponding check for that 
line.
+Multiple codes can be comma-separated: ``# noqa: MIG001, MIG003``.
+
+Include a brief reason after ``--``::
+
+    # noqa: MIG003 -- simple UPDATE safe in offline mode
+"""
+
+from __future__ import annotations
+
+import ast
+import re
+import sys
+from pathlib import Path
+
+from rich.console import Console
+
+console = Console(color_system="standard", width=200)
+
+DML_KEYWORDS = ("UPDATE ", "INSERT ", "DELETE ")
+
+
+def _get_noqa_codes(line: str) -> set[str]:
+    """Extract noqa codes from a source line."""
+    match = re.search(r"#\s*noqa:\s*([\w,\s]+)", line)
+    if match:
+        return {code.strip() for code in match.group(1).split(",")}
+    return set()
+
+
+def _is_dml_string(node: ast.expr) -> bool:
+    """Check if an AST expression is a string literal starting with a DML 
keyword."""
+    if isinstance(node, ast.Constant) and isinstance(node.value, str):
+        stripped = node.value.strip().upper()
+        return any(stripped.startswith(kw) for kw in DML_KEYWORDS)

Review Comment:
   **Accept.** Good catch. Fixed in 877a4a27 — `DML_KEYWORDS` now uses bare 
keywords (`"UPDATE"`, `"INSERT"`, `"DELETE"`) and `_is_dml_string` uses 
`split(None, 1)` to check the first word, handling any whitespace separator 
(space, newline, tab). Verified that `DELETE\nFROM`, `UPDATE\n  SET`, 
`INSERT\nINTO` are all now detected, and existing space-separated DML detection 
is preserved. All 34 existing tests pass.



##########
scripts/ci/prek/check_migration_patterns.py:
##########
@@ -0,0 +1,383 @@
+#!/usr/bin/env python
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# /// script
+# requires-python = ">=3.10,<3.11"
+# dependencies = [
+#   "rich>=13.6.0",
+# ]

Review Comment:
   **Dismiss.** The PEP 723 `requires-python` is metadata for `uv run --script` 
mode — it's not used in the actual prek/pre-commit invocation path. The script 
is called by prek hooks which manage their own Python environment (`language: 
python` in `.pre-commit-config.yaml` uses whatever Python is available). The 
constraint is informational only and has no effect on which Python versions can 
run this script in practice.



##########
.pre-commit-config.yaml:
##########
@@ -1034,6 +1034,14 @@ repos:
         pass_filenames: false
         files: ^.*\.py$
         require_serial: true
+      - id: check-migration-patterns
+        name: Check migration files for anti-patterns (MIG001/MIG002/MIG003)
+        entry: ./scripts/ci/prek/check_migration_patterns.py
+        language: python
+        pass_filenames: true
+        files: ^airflow-core/src/airflow/migrations/versions/.*\.py$
+        additional_dependencies: [rich>=13.6.0]
+        require_serial: true

Review Comment:
   **Dismiss.** This follows from the premise in the previous comment about PEP 
723 constraints. Since prek hooks manage their own Python environment 
independently of PEP 723 metadata, no `language_version` pin is needed in 
`.pre-commit-config.yaml`. The script runs correctly on all supported Python 
versions (3.10+) — verified by CI passing on Python 3.10, 3.11, 3.12, 3.13, and 
3.14.



##########
scripts/ci/prek/check_new_airflow_exception_usage.py:
##########
@@ -59,7 +59,7 @@
 from rich.console import Console
 from rich.panel import Panel
 
-console = Console()
+console = Console(color_system="standard", width=200)

Review Comment:
   **Partial.** This `Console(color_system="standard", width=200)` change was 
requested by human reviewer Lee-W 
([comment](https://github.com/apache/airflow/pull/64972#discussion_r3062503519))
 for the other prek script in this PR. Applied consistently to both scripts in 
the same commit to keep output formatting uniform across prek checks. Minor 
scope inclusion but intentional.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to