potiuk commented on code in PR #35830:
URL: https://github.com/apache/airflow/pull/35830#discussion_r1405515300


##########
scripts/ci/pre_commit/common_precommit_utils.py:
##########
@@ -71,3 +80,74 @@ def get_directory_hash(directory: Path, skip_path_regexp: 
str | None = None) ->
         if file.is_file() and not file.name.startswith("."):
             sha.update(file.read_bytes())
     return sha.hexdigest()
+
+
+def initialize_breeze_precommit(name: str, file: str):
+    if name not in ("__main__", "__mp_main__"):
+        raise SystemExit(
+            "This file is intended to be executed as an executable program. 
You cannot use it as a module."
+            f"To run this script, run the ./{file} command"
+        )
+
+    if os.environ.get("SKIP_BREEZE_PRE_COMMITS"):
+        console.print("[yellow]Skipping breeze pre-commit as 
SKIP_BREEZE_PRE_COMMIT is set")
+        sys.exit(1)
+    if shutil.which("breeze") is None:
+        console.print(
+            "[red]The `breeze` command is not on path.[/]\n\n"
+            "[yellow]Please install breeze with `pipx install -e ./dev/breeze` 
from Airflow sources "
+            "and make sure you run `pipx ensurepath`[/]\n\n"
+            "[bright_blue]You can also set SKIP_BREEZE_PRE_COMMITS env 
variable to non-empty "
+            "value to skip all breeze tests."
+        )
+        sys.exit(1)
+
+
+def run_command_via_breeze_shell(
+    cmd: list[str],
+    python_version: str = DEFAULT_PYTHON_MAJOR_MINOR_VERSION,
+    backend: str = "none",
+    executor: str = "SequentialExecutor",
+    extra_env: dict[str, str] | None = None,
+    project_name: str = "pre-commit",
+    skip_environment_initialization: bool = True,
+    **other_popen_kwargs,
+) -> subprocess.CompletedProcess:
+    extra_env = extra_env or {}
+    subprocess_cmd: list[str] = [
+        "breeze",
+        "shell",
+        "--python",
+        python_version,
+        "--backend",
+        backend,
+        "--executor",
+        executor,
+        "--quiet",
+        "--skip-image-upgrade-check",
+        "--tty",
+        "disabled",
+    ]
+    if skip_environment_initialization:
+        subprocess_cmd.append("--skip-environment-initialization")
+    if project_name:
+        subprocess_cmd.extend(["--project-name", project_name])
+    subprocess_cmd.append(" ".join([shlex.quote(arg) for arg in cmd]))
+    if os.environ.get("VERBOSE_COMMANDS"):
+        console.print(
+            f"[magenta]Running command: {' '.join([shlex.quote(item) for item 
in subprocess_cmd])}[/]"
+        )
+    return subprocess.run(

Review Comment:
   > We also had sometimes problems in mypy in our local CI, if you only check 
selectively the files changed but others depend on a function signature, then 
this might not be catched. Therefore we moved away from differential checks and 
always make full mypy checks. But as Airflow is huge this would also be a 
drawback.
   
   Yes. Happens.. This is the speed vs. accuracy thing as well. I think we 
sometimes have to accept some inconsistencies in this case and hope for better 
support for type checking in `rypy` (or whatever the name will be for someone 
to rewrite mypy in rust - or maybe add `mypy` checks to ruff. Thought the big 
"issue" is that it requires to be run in CI image because of all the 
dependencies. From the experience here it works in 9% percent of cases, the 
number of time when there are such discrepancies is maybe 1  % or less cases 
(that's a bit impression and anecdotal one - I have no hard data) - but this is 
the feeling.
   
   > One question that just jumps into my head - mypy checks are serial 
(probably from the past) - but we are running multiple - in the new setup via 
breeze is it still required to run them serial or could we have them in 
parallel as well? Mypy is running on a single core else...
   
   Yeah. I tried several times to run it in parallel, but unfortunately it 
never worked well. Ideally we should use built-in parallelism but 
https://github.com/python/mypy/issues/933 - this issue is open since 2015 
without much progress. It's been added 2 weeks to 
https://github.com/python/mypy/issues/16472 meta issue tracker so there is some 
hope.
   
   Splitting it via built in parallelism and runnig several mypy instances in 
parallel makes however the problem from previous paragraph WAY WAY worse - you 
not only get wrong results with "differential" check but you also get different 
results when you run `--all-files` check - and pretty randomly changing from 
execution to execution. The set of files you have  "per run" will change every 
time you add a new file, and then it also depends on how manu CPUs you have. So 
using serial in this case was deliberate choice of "stability" vs. "speed". 
This time the stability won - becasue there you could see different set of 
problems in 5 different runs one-after-the-other. It was pretty much unusable. 
   
   But maybe we should try again :)  - feel free to try it and maybe you can 
make it works :)
   
   
   > I just took a look and there is a "verbose" option in pre-commit 
(https://pre-commit.com/#pre-commit-configyaml---hooks) such that still if all 
hooks pass, you could print a warning that the image is outdated - but don't 
block - in the benefit to make the issue visible in pre-commit as well.
   (Maybe an option for a follow-up PR though)
   
   Yeah. Good idea. We should do it for all "serially" run pre-comits with 
Breeze image.  Especially that it is really useful if you have error (and then 
we do not even need --verbose flag - pre-commit will automatically print the 
output when pre-commit fails. There is a bit of a problem there (and this is 
why I added `--quiet` flag in the previous PR - when you have "parallel" run 
(default), you will see the message repeated as many times as you have CPUS - 
which is rather spammy, but for mypy - absolutely we could do it. I will add it 
once I separate out mypy pre-commits in the next PR.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to