potiuk commented on PR #60676:
URL: https://github.com/apache/airflow/pull/60676#issuecomment-3763705279

   > Few nits
   > 
   > This also won't catch function/method scoped imports. Should it be looking 
there too?
   > 
   > At one point ruff had an anayze or graph subcommand that nought give us 
the imports without needing to parse the full ast ourselves. I think another ci 
script uses that too. Might be worth doing it the same?
   
   We already got rid of it because it did not work well with how prek works.  
   Not mentioning the still big fat warning **may change without warning**.
   
   The big difference of what ruff does versus what we can do here is that ruff 
does not generate dependencies to files it did not analyse - and we know the 
convemtion (from airflow.* from airlfow.sdk*) we are looking for - so we can 
analyse individual files that are passed to prek rather than having to pre-load 
all source code - this is what ruff has to do in order to generate "complete" 
dependency graph. And this means that to check single file, you need to analyze 
**all of them**.
   
   See this output to know what I am talking about (a little trimmed down. From 
that you will see that you need to load **all task-sdk + all airflow-core + all 
providers** - so basically all files. to see if you have not used some 
pendencies from "other" files"  - this means (on my machine) 0.47s overhead for 
just "airflow-core + task-sdk" folders, 2s  for "airflow-core + task-sdk + 
providers", 3s for "." (i.e including all tests, dev and other potentially 
included files). And it means we have to pay this overhead even if we change 
one file - i.e. every single commit where a .py file changes, will take 2s 
longer if we use it. 
   
   ```
   ⌁11% [jarekpotiuk:~/code/airflow] add-notice-file-prek-hook(+54/-6)+ ± ruff 
analyze graph --direction dependencies 
task-sdk/src/airflow/sdk/definitions/decorators/__init__
   warning: `ruff analyze graph` is experimental and may change without warning
   {}
   ⌁11% [jarekpotiuk:~/code/airflow] add-notice-file-prek-hook(+54/-6)+ ± ruff 
analyze graph --direction dependencies 
task-sdk/src/airflow/sdk/definitions/decorators/
   warning: `ruff analyze graph` is experimental and may change without warning
   {
     "task-sdk/src/airflow/sdk/definitions/decorators/__init__.py": [
       "task-sdk/src/airflow/sdk/bases/decorator.py",
       "task-sdk/src/airflow/sdk/definitions/dag.py",
       "task-sdk/src/airflow/sdk/definitions/decorators/condition.py",
       "task-sdk/src/airflow/sdk/definitions/decorators/setup_teardown.py",
       "task-sdk/src/airflow/sdk/definitions/decorators/task_group.py"
     ],
     "task-sdk/src/airflow/sdk/definitions/decorators/__init__.pyi": [
       "task-sdk/src/airflow/sdk/bases/decorator.py",
       "task-sdk/src/airflow/sdk/definitions/dag.py",
       "task-sdk/src/airflow/sdk/definitions/decorators/condition.py",
       "task-sdk/src/airflow/sdk/definitions/decorators/task_group.py"
     ],
     "task-sdk/src/airflow/sdk/definitions/decorators/condition.py": [
       "task-sdk/src/airflow/sdk/bases/decorator.py",
       "task-sdk/src/airflow/sdk/bases/operator.py",
       "task-sdk/src/airflow/sdk/definitions/context.py",
       "task-sdk/src/airflow/sdk/exceptions.py"
     ],
     "task-sdk/src/airflow/sdk/definitions/decorators/setup_teardown.py": [
       "task-sdk/src/airflow/sdk/bases/decorator.py",
       "task-sdk/src/airflow/sdk/bases/operator.py",
       "task-sdk/src/airflow/sdk/definitions/_internal/setup_teardown.py",
       "task-sdk/src/airflow/sdk/definitions/decorators/task_group.py",
       "task-sdk/src/airflow/sdk/definitions/xcom_arg.py",
       "task-sdk/src/airflow/sdk/exceptions.py"
     ],
     "task-sdk/src/airflow/sdk/definitions/decorators/task_group.py": [
       "task-sdk/src/airflow/sdk/bases/decorator.py",
       "task-sdk/src/airflow/sdk/definitions/_internal/expandinput.py",
       "task-sdk/src/airflow/sdk/definitions/_internal/node.py",
       "task-sdk/src/airflow/sdk/definitions/dag.py",
       "task-sdk/src/airflow/sdk/definitions/mappedoperator.py",
       "task-sdk/src/airflow/sdk/definitions/taskgroup.py",
       "task-sdk/src/airflow/sdk/definitions/xcom_arg.py"
     ]
   }
   ⌁11% [jarekpotiuk:~/code/airflow] add-notice-file-prek-hook(+54/-6)+ ± ruff 
analyze graph --direction dependencies 
task-sdk/src/airflow/sdk/definitions/decorators/  
airflow-core/src/airflow/providers_manager.py
   warning: `ruff analyze graph` is experimental and may change without warning
   .... lots of output here
     "task-sdk/src/airflow/sdk/definitions/decorators/__init__.py": [
       "airflow-core/src/airflow/providers_manager.py",
       "task-sdk/src/airflow/sdk/bases/decorator.py",
       "task-sdk/src/airflow/sdk/definitions/dag.py",
       "task-sdk/src/airflow/sdk/definitions/decorators/condition.py",
       "task-sdk/src/airflow/sdk/definitions/decorators/setup_teardown.py",
       "task-sdk/src/airflow/sdk/definitions/decorators/task_group.py"
     ],
   ... lots of output here
   }
   ```
   
   Also the code to use it would also be a little complex - we would have to 
keep exclusion rules inside the code rather than in .pre-commit-config.yaml 
file - becuase we could only exclude files from the output json if after all of 
it is generated.
   
   AST + our prek script is way faster for incremental checks. When I removed 
the decorators/__init__.py from the config the time to analyse it was 0.8s for 
all files, and 0.15s when only one file is checked - so it beats the "full 
check" for incremental check a lot - and this is what really matters for 
pre-comit - 0.4s for "all files" in CI is nothing, while 1.8s for every commit 
locally matters a lot. 
   
   ```
   ⌁1% [jarekpotiuk:~/code/airflow] prek-hook-to-check-core-imports(+0/-1)+ 6s 
± time prek run check-core-imports --files 
task-sdk/src/airflow/sdk/definitions/decorators/__init__.py
   Running hooks for `task-sdk`:
   Check for core imports in task-sdk 
files.................................Failed
   - hook id: check-core-imports
   - exit code: 1
   
     src/airflow/sdk/definitions/decorators/__init__.py:
       Line 21: from airflow.providers_manager import ProvidersManager
   
     Found 1 core import(s) in task-sdk files
   prek run check-core-imports --files   0.15s user 0.40s system 217% cpu 0.256 
total
   ⌁1% [jarekpotiuk:~/code/airflow] prek-hook-to-check-core-imports(+0/-1)+ 1 ± 
time prek run check-core-imports --all-files
   Running hooks for `task-sdk`:
   Check for core imports in task-sdk 
files.................................Failed
   - hook id: check-core-imports
   - exit code: 1
   
     src/airflow/sdk/definitions/decorators/__init__.py:
       Line 21: from airflow.providers_manager import ProvidersManager
   
     Found 1 core import(s) in task-sdk files
   prek run check-core-imports --all-files  0.80s user 0.39s system 421% cpu 
0.284 total
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to