zach-overflow commented on code in PR #60012:
URL: https://github.com/apache/airflow/pull/60012#discussion_r2657756225
##########
dev/check_no_orm_in_migrations.py:
##########
@@ -0,0 +1,37 @@
+#!/usr/bin/env python3
+"""
+Pre-commit hook: Ensure no Alembic migration script imports ORM models.
+"""
+import sys
+import re
+from pathlib import Path
+
+MIGRATIONS_DIRS = [
+ Path("airflow-core/src/airflow/migrations/versions"),
+]
+ORM_IMPORT_PATTERNS = [
+ re.compile(r"from +airflow\\.models"),
+ re.compile(r"import +airflow\\.models"),
+ re.compile(r"from +airflow\\.models\\.", re.IGNORECASE),
+ re.compile(r"import +airflow\\.models\\.", re.IGNORECASE),
+]
Review Comment:
Thanks for putting together this pre-commit check, it will definitely help
out! A few initial thoughts:
1. I wonder if the use of the
[`ast`](https://docs.python.org/3/library/ast.html) module might provide a more
robust detection mechanism for improper ORM references in migration scripts.
That would mitigate needing to determine if you're reading from a commented
line, string, docstring, etc. The `ast` module will also likely speed up the
file parsing a bit as a nice side-effect.
2. I believe there are _some_ edge cases where certain constants are fine to
import from the `airflow.models` module. You will probably want to account for
some small allowlist of valid import targets. For example, there are a number
of migration scripts which have the following valid import
([example](https://github.com/apache/airflow/blob/a09ed06a2b59057d1e397b8e28821b26038ea2ab/airflow-core/src/airflow/migrations/versions/0081_3_1_0_remove_dag_id_from_deadline.py#L34)):
```python
from airflow.models import ID_LEN
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]