notatallshaw-gts commented on code in PR #30495:
URL: https://github.com/apache/airflow/pull/30495#discussion_r1161915927
##########
airflow/utils/file.py:
##########
@@ -371,3 +371,14 @@ def might_contain_dag_via_default_heuristic(file_path:
str, zip_file: zipfile.Zi
content = dag_file.read()
content = content.lower()
return all(s in content for s in (b"dag", b"airflow"))
+
+
+def get_airflow_modules_in(file_path: str) -> Generator[str]:
+ """Returns a list of the airflow modules that are imported in the given
file"""
+ with open(file_path, "rb") as dag_file:
+ content = dag_file.read()
+ lines = content.splitlines()
+ for line in lines:
+ if line.startswith(b"from airflow.") or line.startswith(b"import
airflow."):
+ module_name = line.split(b" ")[1]
+ yield module_name.decode()
Review Comment:
> Yes, I was partly aware that parsing python by hand was hacky terrain, but
also, thinking about it, I didn't really see examples that would make this code
fail. Do you have an actual valid python code example in mind ?
I thought of two situations that might trip up your current approach
(personal not sure if it's worth handling but thought I'd mention them).
First when there's an example import inside a multi-line string
```python
"""
<doc string info>
Example DAG:
import airflow...
<more doc string info>
"""
<real DAG>
```
Then when imports are chosen on runtime parameters such as environmental
variables:
```python
if prod:
import airflow.foo
else:
import airflow.bar
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]