potiuk commented on code in PR #35830:
URL: https://github.com/apache/airflow/pull/35830#discussion_r1405515300
##########
scripts/ci/pre_commit/common_precommit_utils.py:
##########
@@ -71,3 +80,74 @@ def get_directory_hash(directory: Path, skip_path_regexp:
str | None = None) ->
if file.is_file() and not file.name.startswith("."):
sha.update(file.read_bytes())
return sha.hexdigest()
+
+
+def initialize_breeze_precommit(name: str, file: str):
+ if name not in ("__main__", "__mp_main__"):
+ raise SystemExit(
+ "This file is intended to be executed as an executable program.
You cannot use it as a module."
+ f"To run this script, run the ./{file} command"
+ )
+
+ if os.environ.get("SKIP_BREEZE_PRE_COMMITS"):
+ console.print("[yellow]Skipping breeze pre-commit as
SKIP_BREEZE_PRE_COMMIT is set")
+ sys.exit(1)
+ if shutil.which("breeze") is None:
+ console.print(
+ "[red]The `breeze` command is not on path.[/]\n\n"
+ "[yellow]Please install breeze with `pipx install -e ./dev/breeze`
from Airflow sources "
+ "and make sure you run `pipx ensurepath`[/]\n\n"
+ "[bright_blue]You can also set SKIP_BREEZE_PRE_COMMITS env
variable to non-empty "
+ "value to skip all breeze tests."
+ )
+ sys.exit(1)
+
+
+def run_command_via_breeze_shell(
+ cmd: list[str],
+ python_version: str = DEFAULT_PYTHON_MAJOR_MINOR_VERSION,
+ backend: str = "none",
+ executor: str = "SequentialExecutor",
+ extra_env: dict[str, str] | None = None,
+ project_name: str = "pre-commit",
+ skip_environment_initialization: bool = True,
+ **other_popen_kwargs,
+) -> subprocess.CompletedProcess:
+ extra_env = extra_env or {}
+ subprocess_cmd: list[str] = [
+ "breeze",
+ "shell",
+ "--python",
+ python_version,
+ "--backend",
+ backend,
+ "--executor",
+ executor,
+ "--quiet",
+ "--skip-image-upgrade-check",
+ "--tty",
+ "disabled",
+ ]
+ if skip_environment_initialization:
+ subprocess_cmd.append("--skip-environment-initialization")
+ if project_name:
+ subprocess_cmd.extend(["--project-name", project_name])
+ subprocess_cmd.append(" ".join([shlex.quote(arg) for arg in cmd]))
+ if os.environ.get("VERBOSE_COMMANDS"):
+ console.print(
+ f"[magenta]Running command: {' '.join([shlex.quote(item) for item
in subprocess_cmd])}[/]"
+ )
+ return subprocess.run(
Review Comment:
> We also had sometimes problems in mypy in our local CI, if you only check
selectively the files changed but others depend on a function signature, then
this might not be catched. Therefore we moved away from differential checks and
always make full mypy checks. But as Airflow is huge this would also be a
drawback.
Yes. Happens.. This is the speed vs. accuracy thing as well. I think we
sometimes have to accept some inconsistencies in this case and hope for better
support for type checking in `rypy` (or whatever the name will be for someone
to rewrite mypy in rust - or maybe add `mypy` checks to ruff. Thought the big
"issue" is that it requires to be run in CI image because of all the
dependencies. From the experience here it works in 9% percent of cases, the
number of time when there are such discrepancies is maybe 1 % or less cases
(that's a bit impression and anecdotal one - I have no hard data) - but this is
the feeling.
> One question that just jumps into my head - mypy checks are serial
(probably from the past) - but we are running multiple - in the new setup via
breeze is it still required to run them serial or could we have them in
parallel as well? Mypy is running on a single core else...
Yeah. I tried several times to run it in parallel, but unfortunately it
never worked well. Ideally we should use built-in parallelism but
https://github.com/python/mypy/issues/933 - this issue is open since 2015
without much progress. It's been added 2 weeks to
https://github.com/python/mypy/issues/16472 meta issue tracker so there is some
hope.
Splitting it via built in parallelism and runnig several mypy instances in
parallel makes however the problem from previous paragraph WAY WAY worse - you
not only get wrong results with "differential" check but you also get different
results when you run `--all-files` check - and pretty randomly changing from
execution to execution. The set of files you have "per run" will change every
time you add a new file, and then it also depends on how manu CPUs you have. So
using serial in this case was deliberate choice of "stability" vs. "speed".
This time the stability won - becasue there you could see different set of
problems in 5 different runs one-after-the-other. It was pretty much unusable.
But maybe we should try again :) - feel free to try it and maybe you can
make it works :)
> I just took a look and there is a "verbose" option in pre-commit
(https://pre-commit.com/#pre-commit-configyaml---hooks) such that still if all
hooks pass, you could print a warning that the image is outdated - but don't
block - in the benefit to make the issue visible in pre-commit as well.
(Maybe an option for a follow-up PR though)
Yeah. Good idea. We should do it for all "serially" run pre-comits with
Breeze image. Especially that it is really useful if you have error (and then
we do not even need --verbose flag - pre-commit will automatically print the
output when pre-commit fails. There is a bit of a problem there (and this is
why I added `--quiet` flag in the previous PR - when you have "parallel" run
(default), you will see the message repeated as many times as you have CPUS -
which is rather spammy, but for mypy - absolutely we could do it. I will add it
once I separate out mypy pre-commits in the next PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]