potiuk commented on PR #60676:
URL: https://github.com/apache/airflow/pull/60676#issuecomment-3763705279
> Few nits
>
> This also won't catch function/method scoped imports. Should it be looking
there too?
>
> At one point ruff had an anayze or graph subcommand that nought give us
the imports without needing to parse the full ast ourselves. I think another ci
script uses that too. Might be worth doing it the same?
We already got rid of it because it did not work well with how prek works.
Not mentioning the still big fat warning **may change without warning**.
The big difference of what ruff does versus what we can do here is that ruff
does not generate dependencies to files it did not analyse - and we know the
convemtion (from airflow.* from airlfow.sdk*) we are looking for - so we can
analyse individual files that are passed to prek rather than having to pre-load
all source code - this is what ruff has to do in order to generate "complete"
dependency graph. And this means that to check single file, you need to analyze
**all of them**.
See this output to know what I am talking about (a little trimmed down. From
that you will see that you need to load **all task-sdk + all airflow-core + all
providers** - so basically all files. to see if you have not used some
pendencies from "other" files" - this means (on my machine) 0.47s overhead for
just "airflow-core + task-sdk" folders, 2s for "airflow-core + task-sdk +
providers", 3s for "." (i.e including all tests, dev and other potentially
included files). And it means we have to pay this overhead even if we change
one file - i.e. every single commit where a .py file changes, will take 2s
longer if we use it.
```
⌁11% [jarekpotiuk:~/code/airflow] add-notice-file-prek-hook(+54/-6)+ ± ruff
analyze graph --direction dependencies
task-sdk/src/airflow/sdk/definitions/decorators/__init__
warning: `ruff analyze graph` is experimental and may change without warning
{}
⌁11% [jarekpotiuk:~/code/airflow] add-notice-file-prek-hook(+54/-6)+ ± ruff
analyze graph --direction dependencies
task-sdk/src/airflow/sdk/definitions/decorators/
warning: `ruff analyze graph` is experimental and may change without warning
{
"task-sdk/src/airflow/sdk/definitions/decorators/__init__.py": [
"task-sdk/src/airflow/sdk/bases/decorator.py",
"task-sdk/src/airflow/sdk/definitions/dag.py",
"task-sdk/src/airflow/sdk/definitions/decorators/condition.py",
"task-sdk/src/airflow/sdk/definitions/decorators/setup_teardown.py",
"task-sdk/src/airflow/sdk/definitions/decorators/task_group.py"
],
"task-sdk/src/airflow/sdk/definitions/decorators/__init__.pyi": [
"task-sdk/src/airflow/sdk/bases/decorator.py",
"task-sdk/src/airflow/sdk/definitions/dag.py",
"task-sdk/src/airflow/sdk/definitions/decorators/condition.py",
"task-sdk/src/airflow/sdk/definitions/decorators/task_group.py"
],
"task-sdk/src/airflow/sdk/definitions/decorators/condition.py": [
"task-sdk/src/airflow/sdk/bases/decorator.py",
"task-sdk/src/airflow/sdk/bases/operator.py",
"task-sdk/src/airflow/sdk/definitions/context.py",
"task-sdk/src/airflow/sdk/exceptions.py"
],
"task-sdk/src/airflow/sdk/definitions/decorators/setup_teardown.py": [
"task-sdk/src/airflow/sdk/bases/decorator.py",
"task-sdk/src/airflow/sdk/bases/operator.py",
"task-sdk/src/airflow/sdk/definitions/_internal/setup_teardown.py",
"task-sdk/src/airflow/sdk/definitions/decorators/task_group.py",
"task-sdk/src/airflow/sdk/definitions/xcom_arg.py",
"task-sdk/src/airflow/sdk/exceptions.py"
],
"task-sdk/src/airflow/sdk/definitions/decorators/task_group.py": [
"task-sdk/src/airflow/sdk/bases/decorator.py",
"task-sdk/src/airflow/sdk/definitions/_internal/expandinput.py",
"task-sdk/src/airflow/sdk/definitions/_internal/node.py",
"task-sdk/src/airflow/sdk/definitions/dag.py",
"task-sdk/src/airflow/sdk/definitions/mappedoperator.py",
"task-sdk/src/airflow/sdk/definitions/taskgroup.py",
"task-sdk/src/airflow/sdk/definitions/xcom_arg.py"
]
}
⌁11% [jarekpotiuk:~/code/airflow] add-notice-file-prek-hook(+54/-6)+ ± ruff
analyze graph --direction dependencies
task-sdk/src/airflow/sdk/definitions/decorators/
airflow-core/src/airflow/providers_manager.py
warning: `ruff analyze graph` is experimental and may change without warning
.... lots of output here
"task-sdk/src/airflow/sdk/definitions/decorators/__init__.py": [
"airflow-core/src/airflow/providers_manager.py",
"task-sdk/src/airflow/sdk/bases/decorator.py",
"task-sdk/src/airflow/sdk/definitions/dag.py",
"task-sdk/src/airflow/sdk/definitions/decorators/condition.py",
"task-sdk/src/airflow/sdk/definitions/decorators/setup_teardown.py",
"task-sdk/src/airflow/sdk/definitions/decorators/task_group.py"
],
... lots of output here
}
```
Also the code to use it would also be a little complex - we would have to
keep exclusion rules inside the code rather than in .pre-commit-config.yaml
file - becuase we could only exclude files from the output json if after all of
it is generated.
AST + our prek script is way faster for incremental checks. When I removed
the decorators/__init__.py from the config the time to analyse it was 0.8s for
all files, and 0.15s when only one file is checked - so it beats the "full
check" for incremental check a lot - and this is what really matters for
pre-comit - 0.4s for "all files" in CI is nothing, while 1.8s for every commit
locally matters a lot.
```
⌁1% [jarekpotiuk:~/code/airflow] prek-hook-to-check-core-imports(+0/-1)+ 6s
± time prek run check-core-imports --files
task-sdk/src/airflow/sdk/definitions/decorators/__init__.py
Running hooks for `task-sdk`:
Check for core imports in task-sdk
files.................................Failed
- hook id: check-core-imports
- exit code: 1
src/airflow/sdk/definitions/decorators/__init__.py:
Line 21: from airflow.providers_manager import ProvidersManager
Found 1 core import(s) in task-sdk files
prek run check-core-imports --files 0.15s user 0.40s system 217% cpu 0.256
total
⌁1% [jarekpotiuk:~/code/airflow] prek-hook-to-check-core-imports(+0/-1)+ 1 ±
time prek run check-core-imports --all-files
Running hooks for `task-sdk`:
Check for core imports in task-sdk
files.................................Failed
- hook id: check-core-imports
- exit code: 1
src/airflow/sdk/definitions/decorators/__init__.py:
Line 21: from airflow.providers_manager import ProvidersManager
Found 1 core import(s) in task-sdk files
prek run check-core-imports --all-files 0.80s user 0.39s system 421% cpu
0.284 total
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]