(airflow) branch main updated: Optimize Static Checks job for most regular PRs (#35461)

eladkal Tue, 07 Nov 2023 08:24:16 -0800

This is an automated email from the ASF dual-hosted git repository.

eladkal pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git



The following commit(s) were added to refs/heads/main by this push:
     new 6d3186531a Optimize Static Checks job for most regular PRs (#35461)
6d3186531a is described below

commit 6d3186531af47c90c607faf72e27cb1cf3e33fe8
Author: Jarek Potiuk <[email protected]>
AuthorDate: Tue Nov 7 17:23:58 2023 +0100

    Optimize Static Checks job for most regular PRs (#35461)
    
    By default static checks run in the CI with `--all-files` to make
    sure that there are no side-effects coming from only running the
    pre-commit checks on a subset of files. The complete `--all-files`
    static check suite on self-hosted runners take ~ 9 minutes and
    for public runners take ~ 15 minutes so there is room for optimisation
    now that Tests are running faster than that in many cases.
    
    While we cannot run static checks on only the files changed in the
    PR (that would lead to many false-positives) we can disable whole
    pre-commit checks in case the incoming PRs do not contain certain
    files modified. For example we we can skip mypy-providers
    when no provider files changed, or we can skip mypy-core when no
    core files changed or we can skip helm linting in case chart files
    have not changed in the incoming PR.
    
    This PR implements selective check rules that will skip some of the
    longest running pre-commit checks in case it looks like the checks
    are not needed.
    
    Still - all tests will be run when "full tests needed" flag is set
    (i.e. when build scripts change or when we detect structural/package
    changes in the project or when "full tests needed" label is set for
    the PR - and all the static checks will also continue running in
    "canary" builds, so we will be able to catch and correct any rules
    that will lead to skipping some of the static checks when they should
    be run in fact.
    
    Also the cache for pre-commits is renamed to be "common" - currently
    the "basic" cache is different than full cache but since cache is
    only really uploaded by the "canary" builds, it's quite ok to
    have a common "full"cache - it will get faster to retrieve it by
    regular PRs.
---
 .github/workflows/ci.yml                           |  11 +-
 dev/breeze/SELECTIVE_CHECKS.md                     |  78 ++++++++---
 .../airflow_breeze/commands/developer_commands.py  |   5 +
 .../src/airflow_breeze/utils/selective_checks.py   | 156 ++++++++++++++++++---
 dev/breeze/tests/test_selective_checks.py          |  70 ++++++++-
 5 files changed, 269 insertions(+), 51 deletions(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 0af32baf10..01cf502daa 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -652,9 +652,9 @@ jobs:
         with:
           path: ~/.cache/pre-commit
           # yamllint disable-line rule:line-length
-          key: 
"pre-commit-full-${{steps.breeze.outputs.host-python-version}}-${{ 
hashFiles('.pre-commit-config.yaml') }}"
+          key: "pre-commit-${{steps.breeze.outputs.host-python-version}}-${{ 
hashFiles('.pre-commit-config.yaml') }}"
           restore-keys: |
-            pre-commit-full-${{steps.breeze.outputs.host-python-version}}-
+            pre-commit-${{steps.breeze.outputs.host-python-version}}-
       - name: "Static checks"
         run: breeze static-checks --all-files --show-diff-on-failure --color 
always --initialize-environment
         env:
@@ -700,12 +700,11 @@ jobs:
         with:
           path: ~/.cache/pre-commit
           # yamllint disable-line rule:line-length
-          key: 
"pre-commit-basic-${{steps.breeze.outputs.host-python-version}}-${{ 
hashFiles('.pre-commit-config.yaml') }}"
+          key: "pre-commit-${{steps.breeze.outputs.host-python-version}}-${{ 
hashFiles('.pre-commit-config.yaml') }}"
           restore-keys: "\
-            pre-commit-full-${{steps.breeze.outputs.host-python-version}}-\
+            pre-commit-${{steps.breeze.outputs.host-python-version}}-\
             ${{ hashFiles('.pre-commit-config.yaml') }}\n
-            pre-commit-basic-${{steps.breeze.outputs.host-python-version}}-\n
-            pre-commit-full-${{steps.breeze.outputs.host-python-version}}-"
+            pre-commit-${{steps.breeze.outputs.host-python-version}}-"
       - name: Fetch incoming commit ${{ github.sha }} with its parent
         uses: actions/checkout@v4
         with:
diff --git a/dev/breeze/SELECTIVE_CHECKS.md b/dev/breeze/SELECTIVE_CHECKS.md
index b3bff154ae..2f7d962041 100644
--- a/dev/breeze/SELECTIVE_CHECKS.md
+++ b/dev/breeze/SELECTIVE_CHECKS.md
@@ -30,14 +30,18 @@ kind of changes. The logic implemented reflects the 
internal architecture of Air
 and it helps to keep down both the usage of jobs in GitHub Actions and CI 
feedback time to
 contributors in case of simpler changes.
 
+## Groups of files that selective check make decisions on
+
 We have the following Groups of files for CI that determine which tests are 
run:
 
 * `Environment files` - if any of those changes, that forces 'full tests 
needed' mode, because changes
   there might simply change the whole environment of what is going on in CI 
(Container image, dependencies)
-* `Python and Javascript production files` - this area is useful in CodeQL 
Security scanning - if any of
-  the python or javascript files for airflow "production" changed, this means 
that the security scans should run
-* `API tests and codegen files` - those are OpenAPI definition files that 
impact Open API specification and
-  determine that we should run dedicated API tests.
+* `Python production files` and `Javascript production files` - this area is 
useful in CodeQL Security scanning
+  - if any of the python or javascript files for airflow "production" changed, 
this means that the security
+  scans should run
+* `Always test files` - Files that belong to "Always" run tests.
+* `API tests files` and `Codegen test files` - those are OpenAPI definition 
files that impact
+  Open API specification and determine that we should run dedicated API tests.
 * `Helm files` - change in those files impacts helm "rendering" tests - 
`chart` folder and `helm_tests` folder.
 * `Setup files` - change in the setup files indicates that we should run  
`upgrade to newer dependencies` -
   setup.* files, pyproject.toml, generated dependencies files in `generated` 
folder
@@ -51,24 +55,25 @@ We have the following Groups of files for CI that determine 
which tests are run:
 * `All Python files` - if none of the Python file changed, that indicates that 
we should not run unit tests
 * `All source files` - if none of the sources change, that indicates that we 
should probably not build
   an image and run any image-based static checks
+* `All Airflow Python files` - files that are checked by `mypy-core` static 
checks
+* `All Providers Python files` - files that are checked by `mypy-providers` 
static checks
+* `All Dev Python files` - files that are checked by `mypy-dev` static checks
+* `All Docs Python files` - files that are checked by `mypy-docs` static checks
+* `All Provider Yaml files` - all provider yaml files
 
-We have the following unit test types that can be selectively disabled/enabled 
based on the
-content of the incoming PR. Usually they are limited to a sub-folder of the 
"tests" folder but there
-are some exceptions. You can read more about those in `TESTING.rst 
<TESTING.rst>`.
-
-We also have `Integration` tests that are running Integration tests with 
external software that is run
-via `--integration` flag in `breeze` environment.
 
-* `Integration` - tests that require external integration images running in 
docker-compose
+We have a number of `TEST_TYPES` that can be selectively disabled/enabled 
based on the
+content of the incoming PR. Usually they are limited to a sub-folder of the 
"tests" folder but there
+are some exceptions. You can read more about those in `TESTING.rst 
<TESTING.rst>`. Those types
+are determined by selective checks and are used to run `DB` and `Non-DB` tests.
 
-Even if the types are separated, In case they share the same backend 
version/python version, they are
-run sequentially in the same job, on the same CI machine. Each of them in a 
separate `docker run` command
-and with additional docker cleaning between the steps to not fall into the 
trap of exceeding resource
-usage in one big test run, but also not to increase the number of jobs per 
each Pull Request.
+The `DB` tests inside each `TEST_TYPE` are run sequentially (because they use 
DB as state) while `TEST_TYPES`
+are run in parallel - each within separate docker-compose project. The 
`Non-DB` tests are all executed
+together using `pytest-xdist` (pytest-xdist distributes the tests among 
parallel workers).
 
-The logic implements the following rules:
+## Selective check decision rules
 
-* `Full tests mode` is enabled when the event is PUSH, or SCHEDULE or we miss 
commit info or any of the
+* `Full tests` case is enabled when the event is PUSH, or SCHEDULE or we miss 
commit info or any of the
   important environment files (setup.py, setup.cfg, provider.yaml, Dockerfile, 
build scripts) changed or
   when `full tests needed` label is set.  That enables all matrix combinations 
of variables (representative)
   and all possible test type. No further checks are performed.
@@ -78,8 +83,8 @@ The logic implements the following rules:
   are enabled if any of the relevant files have been changed.
 * `Helm` tests are run only if relevant files have been changed and if current 
branch is `main`.
 * If no Source files are changed - no tests are run and no further rules below 
are checked.
-* `Image building` is enabled if either test are run, docs are build or 
kubernetes tests are run. All those
-  need `CI` or `PROD` images to be built.
+* `CI Image building` is enabled if either test are run, docs are build.
+* `PROD Image building` is enabled when kubernetes tests are run.
 * In case of `Providers` test in regular PRs, additional check is done in 
order to determine which
   providers are affected and the actual selection is made based on that:
   * if directly provider code is changed (either in the provider, test or 
system tests) then this provider
@@ -94,7 +99,7 @@ The logic implements the following rules:
 * If there are no files left in sources after matching the test types and 
Kubernetes files,
   then apparently some Core/Other files have been changed. This automatically 
adds all test
   types to execute. This is done because changes in core might impact all the 
other test types.
-* if `Image building` is disabled, only basic pre-commits are enabled - no 
'image-depending` pre-commits
+* if `CI Image building` is disabled, only basic pre-commits are enabled - no 
'image-depending` pre-commits
   are enabled.
 * If there are some setup files changed, `upgrade to newer dependencies` is 
enabled.
 * If docs are build, the `docs-list-as-string` will determine which docs 
packages to build. This is based on
@@ -103,16 +108,49 @@ The logic implements the following rules:
   changed, also providers docs are built because all providers depend on 
airflow docs. If any of the docs
   build python files changed or when build is "canary" type in main - all docs 
packages are built.
 
+## Skipping pre-commits (Static checks)
+
+Our CI always run pre-commit checks with `--all-files` flag. This is in order 
to avoid cases where
+different check results are run when only subset of files is used. This has an 
effect that the pre-commit
+tests take a long time to run when all of them are run. Selective checks allow 
to save a lot of time
+for those tests in regular PRs of contributors by smart detection of which 
pre-commits should be skipped
+when some files are not changed. Those are the rules implemented:
+
+* The `identity` check is always skipped (saves space to display all changed 
files in CI)
+* The provider specific checks are skipped when builds are running in v2_* 
branches (we do not build
+  providers from those branches. Those are the checks skipped in this case:
+  * check-airflow-provider-compatibility
+  * check-extra-packages-references
+  * check-provider-yaml-valid
+  * lint-helm-chart
+  * mypy-providers
+* If "full tests" mode is detected, no more pre-commits are skipped - we run 
all of them
+* The following checks are skipped if those files are not changed:
+  * if no `All Providers Python files` changed - `mypy-providers` check is 
skipped
+  * if no `All Airflow Python files` changed - `mypy-core` check is skipped
+  * if no `All Docs Python files` changed - `mypy-docs` check is skipped
+  * if no `All Dev Python files` changed - `mypy-dev` check is skipped
+  * if no `WWW files` changed - `ts-compile-format-lint-www` check is skipped
+  * if no `All Python files` changed - `flynt` check is skipped
+  * if no `Helm files` changed - `lint-helm-chart` check is skipped
+  * if no `All Providers Python files` and no `All Providers Yaml files` are 
changed -
+    `check-provider-yaml-valid` check is skipped
+
+## Suspended providers
 
 The selective checks will fail in PR if it contains changes to a suspended 
provider unless you set the
 label `allow suspended provider changes` in the PR. This is to prevent 
accidental changes to suspended
 providers.
 
+
+## Selective check outputs
+
 The selective check outputs available are described below. In case of 
`list-as-string` values,
 empty string means `everything`, where lack of the output means `nothing` and 
list elements are
 separated by spaces. This is to accommodate for the wau how outputs of this 
kind can be easily used by
 Github Actions to pass the list of parameters to a command to execute
 
+
 | Output                             | Meaning of the output                   
                                                                | Example value 
                                      | List as string |
 
|------------------------------------|---------------------------------------------------------------------------------------------------------|-----------------------------------------------------|----------------|
 | affected-providers-list-as-string  | List of providers affected when they 
are selectively affected.                                          | airbyte 
http                                        | *              |
diff --git a/dev/breeze/src/airflow_breeze/commands/developer_commands.py 
b/dev/breeze/src/airflow_breeze/commands/developer_commands.py
index 60d3138e68..69e8a6edd7 100644
--- a/dev/breeze/src/airflow_breeze/commands/developer_commands.py
+++ b/dev/breeze/src/airflow_breeze/commands/developer_commands.py
@@ -625,6 +625,11 @@ def static_checks(
         command_to_execute.extend(file)
     if precommit_args:
         command_to_execute.extend(precommit_args)
+    skip_checks = os.environ.get("SKIP")
+    if skip_checks and skip_checks != "identity":
+        get_console().print("\nThis static check run skips those checks:\n")
+        get_console().print(skip_checks.split(","))
+        get_console().print()
     env = os.environ.copy()
     env["GITHUB_REPOSITORY"] = github_repository
     static_checks_result = run_command(
diff --git a/dev/breeze/src/airflow_breeze/utils/selective_checks.py 
b/dev/breeze/src/airflow_breeze/utils/selective_checks.py
index 214dbb63a9..c8b5eaf7bf 100644
--- a/dev/breeze/src/airflow_breeze/utils/selective_checks.py
+++ b/dev/breeze/src/airflow_breeze/utils/selective_checks.py
@@ -88,6 +88,11 @@ class FileGroupForCi(Enum):
     KUBERNETES_FILES = "kubernetes_files"
     ALL_PYTHON_FILES = "all_python_files"
     ALL_SOURCE_FILES = "all_sources_for_tests"
+    ALL_AIRFLOW_PYTHON_FILES = "all_airflow_python_files"
+    ALL_PROVIDERS_PYTHON_FILES = "all_provider_python_files"
+    ALL_DEV_PYTHON_FILES = "all_dev_python_files"
+    ALL_PROVIDER_YAML_FILES = "all_provider_yaml_files"
+    ALL_DOCS_PYTHON_FILES = "all_docs_python_files"
 
 
 T = TypeVar("T", FileGroupForCi, SelectiveUnitTestTypes)
@@ -165,7 +170,21 @@ CI_FILE_GROUP_MATCHES = HashableDict(
             r"^tests/system/providers/cncf/kubernetes/",
         ],
         FileGroupForCi.ALL_PYTHON_FILES: [
-            r"\.py$",
+            r".*\.py$",
+        ],
+        FileGroupForCi.ALL_AIRFLOW_PYTHON_FILES: [
+            r".*\.py$",
+        ],
+        FileGroupForCi.ALL_PROVIDERS_PYTHON_FILES: [
+            r"^airflow/providers/.*\.py$",
+            r"^tests/providers/.*\.py$",
+            r"^tests/system/providers/.*\.py$",
+        ],
+        FileGroupForCi.ALL_DOCS_PYTHON_FILES: [
+            r"^docs/.*\.py$",
+        ],
+        FileGroupForCi.ALL_DEV_PYTHON_FILES: [
+            r"^dev/.*\.py$",
         ],
         FileGroupForCi.ALL_SOURCE_FILES: [
             r"^.pre-commit-config.yaml$",
@@ -180,9 +199,27 @@ CI_FILE_GROUP_MATCHES = HashableDict(
         FileGroupForCi.ALWAYS_TESTS_FILES: [
             r"^tests/always/",
         ],
+        FileGroupForCi.ALL_PROVIDER_YAML_FILES: [
+            r".*/provider\.yaml$",
+        ],
     }
 )
 
+CI_FILE_GROUP_EXCLUDES = HashableDict(
+    {
+        FileGroupForCi.ALL_AIRFLOW_PYTHON_FILES: [
+            r"^.*/.*_vendor/.*",
+            r"^airflow/migrations/.*",
+            r"^airflow/providers/.*",
+            r"^dev/.*",
+            r"^docs/.*",
+            r"^provider_packages/.*",
+            r"^tests/providers/.*",
+            r"^tests/system/providers/.*",
+            r"^tests/dags/test_imports.py",
+        ]
+    }
+)
 
 TEST_TYPE_MATCHES = HashableDict(
     {
@@ -215,6 +252,8 @@ TEST_TYPE_MATCHES = HashableDict(
     }
 )
 
+TEST_TYPE_EXCLUDES = HashableDict({})
+
 
 def find_provider_affected(changed_file: str, include_docs: bool) -> str | 
None:
     file_path = AIRFLOW_SOURCES_ROOT / changed_file
@@ -372,7 +411,9 @@ class SelectiveChecks:
         if self._github_event in [GithubEvents.PUSH, GithubEvents.SCHEDULE, 
GithubEvents.WORKFLOW_DISPATCH]:
             get_console().print(f"[warning]Full tests needed because event is 
{self._github_event}[/]")
             return True
-        if self._matching_files(FileGroupForCi.ENVIRONMENT_FILES, 
CI_FILE_GROUP_MATCHES):
+        if self._matching_files(
+            FileGroupForCi.ENVIRONMENT_FILES, CI_FILE_GROUP_MATCHES, 
CI_FILE_GROUP_EXCLUDES
+        ):
             get_console().print("[warning]Running everything because env files 
changed[/]")
             return True
         if FULL_TESTS_NEEDED_LABEL in self._pr_labels:
@@ -487,16 +528,27 @@ class SelectiveChecks:
         )
         return " ".join(short_combo_titles)
 
-    def _match_files_with_regexps(self, matched_files, regexps):
+    def _match_files_with_regexps(self, matched_files, matching_regexps):
         for file in self._files:
-            if any(re.match(regexp, file) for regexp in regexps):
+            if any(re.match(regexp, file) for regexp in matching_regexps):
                 matched_files.append(file)
 
+    def _exclude_files_with_regexps(self, matched_files, exclude_regexps):
+        for file in self._files:
+            if any(re.match(regexp, file) for regexp in exclude_regexps):
+                if file in matched_files:
+                    matched_files.remove(file)
+
     @lru_cache(maxsize=None)
-    def _matching_files(self, match_group: T, match_dict: dict[T, list[str]]) 
-> list[str]:
+    def _matching_files(
+        self, match_group: T, match_dict: dict[T, list[str]], exclude_dict: 
dict[T, list[str]]
+    ) -> list[str]:
         matched_files: list[str] = []
-        regexps = match_dict[match_group]
-        self._match_files_with_regexps(matched_files, regexps)
+        match_regexps = match_dict[match_group]
+        excluded_regexps = exclude_dict.get(match_group)
+        self._match_files_with_regexps(matched_files, match_regexps)
+        if excluded_regexps:
+            self._exclude_files_with_regexps(matched_files, excluded_regexps)
         count = len(matched_files)
         if count > 0:
             get_console().print(f"[warning]{match_group} matched {count} 
files.[/]")
@@ -509,7 +561,7 @@ class SelectiveChecks:
         if self.full_tests_needed:
             get_console().print(f"[warning]{source_area} enabled because we 
are running everything[/]")
             return True
-        matched_files = self._matching_files(source_area, 
CI_FILE_GROUP_MATCHES)
+        matched_files = self._matching_files(source_area, 
CI_FILE_GROUP_MATCHES, CI_FILE_GROUP_EXCLUDES)
         if matched_files:
             get_console().print(
                 f"[warning]{source_area} enabled because it matched 
{len(matched_files)} changed files[/]"
@@ -577,7 +629,7 @@ class SelectiveChecks:
     def _select_test_type_if_matching(
         self, test_types: set[str], test_type: SelectiveUnitTestTypes
     ) -> list[str]:
-        matched_files = self._matching_files(test_type, TEST_TYPE_MATCHES)
+        matched_files = self._matching_files(test_type, TEST_TYPE_MATCHES, 
TEST_TYPE_EXCLUDES)
         count = len(matched_files)
         if count > 0:
             test_types.add(test_type.value)
@@ -614,10 +666,18 @@ class SelectiveChecks:
             self._select_test_type_if_matching(candidate_test_types, 
SelectiveUnitTestTypes.API)
         )
 
-        kubernetes_files = 
self._matching_files(FileGroupForCi.KUBERNETES_FILES, CI_FILE_GROUP_MATCHES)
-        system_test_files = 
self._matching_files(FileGroupForCi.SYSTEM_TEST_FILES, CI_FILE_GROUP_MATCHES)
-        all_source_files = 
self._matching_files(FileGroupForCi.ALL_SOURCE_FILES, CI_FILE_GROUP_MATCHES)
-        test_always_files = 
self._matching_files(FileGroupForCi.ALWAYS_TESTS_FILES, CI_FILE_GROUP_MATCHES)
+        kubernetes_files = self._matching_files(
+            FileGroupForCi.KUBERNETES_FILES, CI_FILE_GROUP_MATCHES, 
CI_FILE_GROUP_EXCLUDES
+        )
+        system_test_files = self._matching_files(
+            FileGroupForCi.SYSTEM_TEST_FILES, CI_FILE_GROUP_MATCHES, 
CI_FILE_GROUP_EXCLUDES
+        )
+        all_source_files = self._matching_files(
+            FileGroupForCi.ALL_SOURCE_FILES, CI_FILE_GROUP_MATCHES, 
CI_FILE_GROUP_EXCLUDES
+        )
+        test_always_files = self._matching_files(
+            FileGroupForCi.ALWAYS_TESTS_FILES, CI_FILE_GROUP_MATCHES, 
CI_FILE_GROUP_EXCLUDES
+        )
         remaining_files = (
             set(all_source_files)
             - set(matched_files)
@@ -710,7 +770,12 @@ class SelectiveChecks:
     @cached_property
     def upgrade_to_newer_dependencies(self) -> bool:
         return (
-            len(self._matching_files(FileGroupForCi.SETUP_FILES, 
CI_FILE_GROUP_MATCHES)) > 0
+            len(
+                self._matching_files(
+                    FileGroupForCi.SETUP_FILES, CI_FILE_GROUP_MATCHES, 
CI_FILE_GROUP_EXCLUDES
+                )
+            )
+            > 0
             or self._github_event in [GithubEvents.PUSH, GithubEvents.SCHEDULE]
             or UPGRADE_TO_NEWER_DEPENDENCIES_LABEL in self._pr_labels
         )
@@ -752,12 +817,63 @@ class SelectiveChecks:
 
     @cached_property
     def skip_pre_commits(self) -> str:
-        return (
-            "identity"
-            if self._default_branch == "main"
-            else "identity,check-airflow-provider-compatibility,"
-            "check-extra-packages-references,check-provider-yaml-valid"
-        )
+        pre_commits_to_skip = set()
+        pre_commits_to_skip.add("identity")
+        if self._default_branch != "main":
+            # Skip those tests on all "release" branches
+            pre_commits_to_skip.update(
+                (
+                    "check-airflow-provider-compatibility",
+                    "check-extra-packages-references",
+                    "check-provider-yaml-valid",
+                    "lint-helm-chart",
+                    "mypy-providers",
+                )
+            )
+        if self.full_tests_needed:
+            # when full tests are needed, we do not want to skip any checks 
and we should
+            # run all the pre-commits just to be sure everything is ok when 
some structural changes occurred
+            return ",".join(sorted(pre_commits_to_skip))
+        if not self._matching_files(
+            FileGroupForCi.ALL_PROVIDERS_PYTHON_FILES, CI_FILE_GROUP_MATCHES, 
CI_FILE_GROUP_EXCLUDES
+        ):
+            pre_commits_to_skip.add("mypy-providers")
+        if not self._matching_files(
+            FileGroupForCi.ALL_AIRFLOW_PYTHON_FILES, CI_FILE_GROUP_MATCHES, 
CI_FILE_GROUP_EXCLUDES
+        ):
+            pre_commits_to_skip.add("mypy-core")
+        if not self._matching_files(
+            FileGroupForCi.ALL_DOCS_PYTHON_FILES, CI_FILE_GROUP_MATCHES, 
CI_FILE_GROUP_EXCLUDES
+        ):
+            pre_commits_to_skip.add("mypy-docs")
+        if not self._matching_files(
+            FileGroupForCi.ALL_DEV_PYTHON_FILES, CI_FILE_GROUP_MATCHES, 
CI_FILE_GROUP_EXCLUDES
+        ):
+            pre_commits_to_skip.add("mypy-dev")
+        if not self._matching_files(FileGroupForCi.WWW_FILES, 
CI_FILE_GROUP_MATCHES, CI_FILE_GROUP_EXCLUDES):
+            pre_commits_to_skip.add("ts-compile-format-lint-www")
+        if not self._matching_files(
+            FileGroupForCi.ALL_PYTHON_FILES, CI_FILE_GROUP_MATCHES, 
CI_FILE_GROUP_EXCLUDES
+        ):
+            pre_commits_to_skip.add("flynt")
+        if not self._matching_files(
+            FileGroupForCi.HELM_FILES,
+            CI_FILE_GROUP_MATCHES,
+            CI_FILE_GROUP_EXCLUDES,
+        ):
+            pre_commits_to_skip.add("lint-helm-chart")
+        if not (
+            self._matching_files(
+                FileGroupForCi.ALL_PROVIDER_YAML_FILES, CI_FILE_GROUP_MATCHES, 
CI_FILE_GROUP_EXCLUDES
+            )
+            or self._matching_files(
+                FileGroupForCi.ALL_PROVIDERS_PYTHON_FILES, 
CI_FILE_GROUP_MATCHES, CI_FILE_GROUP_EXCLUDES
+            )
+        ):
+            # only skip provider validation if none of the provider.yaml and 
provider
+            # python files changed because validation also walks through all 
the provider python files
+            pre_commits_to_skip.add("check-provider-yaml-valid")
+        return ",".join(sorted(pre_commits_to_skip))
 
     @cached_property
     def skip_provider_tests(self) -> bool:
diff --git a/dev/breeze/tests/test_selective_checks.py 
b/dev/breeze/tests/test_selective_checks.py
index 7426e0a558..6857636d04 100644
--- a/dev/breeze/tests/test_selective_checks.py
+++ b/dev/breeze/tests/test_selective_checks.py
@@ -101,6 +101,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                     "run-tests": "false",
                     "run-amazon-tests": "false",
                     "docs-build": "false",
+                    "skip-pre-commits": 
"check-provider-yaml-valid,flynt,identity,lint-helm-chart,mypy-core,mypy-dev,"
+                    "mypy-docs,mypy-providers,ts-compile-format-lint-www",
                     "upgrade-to-newer-dependencies": "false",
                     "parallel-test-types-list-as-string": None,
                 },
@@ -122,6 +124,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                     "run-tests": "true",
                     "run-amazon-tests": "false",
                     "docs-build": "true",
+                    "skip-pre-commits": 
"check-provider-yaml-valid,identity,lint-helm-chart,mypy-dev,"
+                    "mypy-docs,mypy-providers,ts-compile-format-lint-www",
                     "upgrade-to-newer-dependencies": "false",
                     "parallel-test-types-list-as-string": "API Always",
                 },
@@ -143,6 +147,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                     "run-tests": "true",
                     "run-amazon-tests": "false",
                     "docs-build": "true",
+                    "skip-pre-commits": 
"check-provider-yaml-valid,identity,lint-helm-chart,mypy-dev,"
+                    "mypy-docs,mypy-providers,ts-compile-format-lint-www",
                     "upgrade-to-newer-dependencies": "false",
                     "parallel-test-types-list-as-string": "Always Operators",
                 },
@@ -168,6 +174,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                     "run-tests": "true",
                     "run-amazon-tests": "true",
                     "docs-build": "true",
+                    "skip-pre-commits": 
"identity,lint-helm-chart,mypy-dev,mypy-docs,"
+                    "ts-compile-format-lint-www",
                     "upgrade-to-newer-dependencies": "false",
                     "parallel-test-types-list-as-string": "API Always 
Providers[amazon] "
                     "Providers[common.sql,openlineage,pgvector,postgres] 
Providers[google]",
@@ -190,6 +198,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                     "run-tests": "true",
                     "run-amazon-tests": "false",
                     "docs-build": "false",
+                    "skip-pre-commits": 
"identity,lint-helm-chart,mypy-core,mypy-dev,mypy-docs,"
+                    "ts-compile-format-lint-www",
                     "run-kubernetes-tests": "false",
                     "upgrade-to-newer-dependencies": "false",
                     "parallel-test-types-list-as-string": "Always 
Providers[apache.beam] Providers[google]",
@@ -212,6 +222,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                     "run-tests": "false",
                     "run-amazon-tests": "false",
                     "docs-build": "true",
+                    "skip-pre-commits": 
"check-provider-yaml-valid,flynt,identity,lint-helm-chart,mypy-core,mypy-dev,"
+                    "mypy-docs,mypy-providers,ts-compile-format-lint-www",
                     "run-kubernetes-tests": "false",
                     "upgrade-to-newer-dependencies": "false",
                     "parallel-test-types-list-as-string": None,
@@ -238,6 +250,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                     "run-tests": "true",
                     "run-amazon-tests": "true",
                     "docs-build": "true",
+                    "skip-pre-commits": 
"identity,mypy-core,mypy-dev,mypy-docs,ts-compile-format-lint-www",
                     "run-kubernetes-tests": "true",
                     "upgrade-to-newer-dependencies": "false",
                     "parallel-test-types-list-as-string": "Always 
Providers[amazon] "
@@ -267,6 +280,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                     "run-tests": "true",
                     "run-amazon-tests": "true",
                     "docs-build": "true",
+                    "skip-pre-commits": 
"identity,mypy-core,mypy-dev,mypy-docs,ts-compile-format-lint-www",
                     "run-kubernetes-tests": "true",
                     "upgrade-to-newer-dependencies": "false",
                     "parallel-test-types-list-as-string": "Always "
@@ -295,6 +309,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                     "run-tests": "true",
                     "run-amazon-tests": "false",
                     "docs-build": "true",
+                    "skip-pre-commits": 
"identity,mypy-core,mypy-dev,mypy-docs,ts-compile-format-lint-www",
                     "run-kubernetes-tests": "true",
                     "upgrade-to-newer-dependencies": "false",
                     "parallel-test-types-list-as-string": "Always 
Providers[airbyte,http]",
@@ -321,6 +336,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                     "needs-helm-tests": "true",
                     "run-tests": "true",
                     "docs-build": "true",
+                    "skip-pre-commits": 
"check-provider-yaml-valid,identity,mypy-dev,"
+                    "mypy-docs,mypy-providers,ts-compile-format-lint-www",
                     "run-amazon-tests": "false",
                     "run-kubernetes-tests": "true",
                     "upgrade-to-newer-dependencies": "false",
@@ -345,6 +362,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                     "run-tests": "true",
                     "run-amazon-tests": "true",
                     "docs-build": "true",
+                    "full-tests-needed": "true",
+                    "skip-pre-commits": "identity",
                     "upgrade-to-newer-dependencies": "true",
                     "parallel-test-types-list-as-string": 
ALL_CI_SELECTIVE_TEST_TYPES,
                 },
@@ -367,6 +386,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                     "run-tests": "true",
                     "run-amazon-tests": "true",
                     "docs-build": "true",
+                    "full-tests-needed": "true",
+                    "skip-pre-commits": "identity",
                     "upgrade-to-newer-dependencies": "true",
                     "parallel-test-types-list-as-string": 
ALL_CI_SELECTIVE_TEST_TYPES,
                 },
@@ -388,6 +409,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                 "needs-helm-tests": "false",
                 "run-tests": "true",
                 "docs-build": "true",
+                "skip-pre-commits": 
"identity,lint-helm-chart,mypy-core,mypy-dev,mypy-docs,ts-compile-format-lint-www",
                 "run-kubernetes-tests": "false",
                 "upgrade-to-newer-dependencies": "false",
                 "run-amazon-tests": "true",
@@ -411,6 +433,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                 "run-tests": "true",
                 "run-amazon-tests": "false",
                 "docs-build": "false",
+                "skip-pre-commits": 
"identity,lint-helm-chart,mypy-core,mypy-dev,mypy-docs,ts-compile-format-lint-www",
                 "run-kubernetes-tests": "false",
                 "upgrade-to-newer-dependencies": "false",
                 "parallel-test-types-list-as-string": "Always 
Providers[airbyte,http]",
@@ -433,6 +456,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, 
str], stderr: str):
                 "run-tests": "true",
                 "run-amazon-tests": "true",
                 "docs-build": "true",
+                "skip-pre-commits": 
"identity,lint-helm-chart,mypy-core,mypy-dev,mypy-docs,ts-compile-format-lint-www",
                 "run-kubernetes-tests": "false",
                 "upgrade-to-newer-dependencies": "false",
                 "parallel-test-types-list-as-string": "Always 
Providers[amazon] "
@@ -501,6 +525,7 @@ def test_expected_output_pull_request_main(
                     "docs-build": "true",
                     "docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
                     "full-tests-needed": "true",
+                    "skip-pre-commits": "identity",
                     "upgrade-to-newer-dependencies": "false",
                     "parallel-test-types-list-as-string": 
ALL_CI_SELECTIVE_TEST_TYPES,
                 },
@@ -527,6 +552,7 @@ def test_expected_output_pull_request_main(
                     "docs-build": "true",
                     "docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
                     "full-tests-needed": "true",
+                    "skip-pre-commits": "identity",
                     "upgrade-to-newer-dependencies": "false",
                     "parallel-test-types-list-as-string": 
ALL_CI_SELECTIVE_TEST_TYPES,
                 },
@@ -551,6 +577,7 @@ def test_expected_output_pull_request_main(
                     "docs-build": "true",
                     "docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
                     "full-tests-needed": "true",
+                    "skip-pre-commits": "identity",
                     "upgrade-to-newer-dependencies": "false",
                     "parallel-test-types-list-as-string": 
ALL_CI_SELECTIVE_TEST_TYPES,
                 },
@@ -562,7 +589,7 @@ def test_expected_output_pull_request_main(
             pytest.param(
                 ("INTHEWILD.md",),
                 ("full tests needed",),
-                "v2-3-stable",
+                "v2-7-stable",
                 {
                     "affected-providers-list-as-string": 
ALL_PROVIDERS_AFFECTED,
                     "all-python-versions": "['3.8', '3.9', '3.10', '3.11']",
@@ -575,13 +602,17 @@ def test_expected_output_pull_request_main(
                     "docs-build": "true",
                     "docs-list-as-string": "apache-airflow docker-stack",
                     "full-tests-needed": "true",
+                    "skip-pre-commits": "check-airflow-provider-compatibility,"
+                    
"check-extra-packages-references,check-provider-yaml-valid,identity,"
+                    "lint-helm-chart,mypy-providers",
                     "skip-provider-tests": "true",
                     "upgrade-to-newer-dependencies": "false",
                     "parallel-test-types-list-as-string": "API Always 
BranchExternalPython "
                     "BranchPythonVenv CLI Core ExternalPython Operators Other 
PlainAsserts "
                     "PythonVenv Serialization WWW",
                 },
-                id="Everything should run except Providers when full tests are 
needed for non-main branch",
+                id="Everything should run except Providers and lint pre-commit 
"
+                "when full tests are needed for non-main branch",
             )
         ),
     ],
@@ -617,6 +648,9 @@ def test_expected_output_full_tests_needed(
                 "docs-build": "false",
                 "docs-list-as-string": None,
                 "full-tests-needed": "false",
+                "skip-pre-commits": 
"check-airflow-provider-compatibility,check-extra-packages-references,"
+                "check-provider-yaml-valid,flynt,identity,lint-helm-chart,"
+                
"mypy-core,mypy-dev,mypy-docs,mypy-providers,ts-compile-format-lint-www",
                 "upgrade-to-newer-dependencies": "false",
                 "skip-provider-tests": "true",
                 "parallel-test-types-list-as-string": None,
@@ -641,13 +675,16 @@ def test_expected_output_full_tests_needed(
                 "docs-build": "true",
                 "docs-list-as-string": "apache-airflow docker-stack",
                 "full-tests-needed": "false",
+                "skip-pre-commits": 
"check-airflow-provider-compatibility,check-extra-packages-references,"
+                "check-provider-yaml-valid,identity,lint-helm-chart,"
+                
"mypy-core,mypy-dev,mypy-docs,mypy-providers,ts-compile-format-lint-www",
                 "run-kubernetes-tests": "true",
                 "upgrade-to-newer-dependencies": "false",
                 "skip-provider-tests": "true",
                 "parallel-test-types-list-as-string": "Always",
             },
-            id="No Helm tests, No providers should run if only chart/providers 
changed in non-main "
-            "but PROD image should be built",
+            id="No Helm tests, No providers no lint charts, should run if "
+            "only chart/providers changed in non-main but PROD image should be 
built",
         ),
         pytest.param(
             (
@@ -669,6 +706,9 @@ def test_expected_output_full_tests_needed(
                 "docs-build": "true",
                 "docs-list-as-string": "apache-airflow docker-stack",
                 "full-tests-needed": "false",
+                "skip-pre-commits": 
"check-airflow-provider-compatibility,check-extra-packages-references,"
+                "check-provider-yaml-valid,identity,lint-helm-chart,"
+                "mypy-dev,mypy-docs,mypy-providers,ts-compile-format-lint-www",
                 "run-kubernetes-tests": "true",
                 "upgrade-to-newer-dependencies": "false",
                 "skip-provider-tests": "true",
@@ -695,10 +735,14 @@ def test_expected_output_full_tests_needed(
                 "run-kubernetes-tests": "false",
                 "upgrade-to-newer-dependencies": "false",
                 "skip-provider-tests": "true",
+                "skip-pre-commits": 
"check-airflow-provider-compatibility,check-extra-packages-references,"
+                "check-provider-yaml-valid,identity,lint-helm-chart,"
+                "mypy-dev,mypy-docs,mypy-providers,ts-compile-format-lint-www",
                 "parallel-test-types-list-as-string": "API Always 
BranchExternalPython BranchPythonVenv "
                 "CLI Core ExternalPython Operators Other PlainAsserts 
PythonVenv Serialization WWW",
             },
-            id="All tests except Providers should run if core file changed in 
non-main branch",
+            id="All tests except Providers and helm lint pre-commit "
+            "should run if core file changed in non-main branch",
         ),
     ],
 )
@@ -731,6 +775,8 @@ def test_expected_output_pull_request_v2_7(
                 "docs-build": "false",
                 "docs-list-as-string": None,
                 "upgrade-to-newer-dependencies": "false",
+                "skip-pre-commits": 
"check-provider-yaml-valid,flynt,identity,lint-helm-chart,"
+                
"mypy-core,mypy-dev,mypy-docs,mypy-providers,ts-compile-format-lint-www",
                 "skip-provider-tests": "true",
                 "parallel-test-types-list-as-string": None,
             },
@@ -748,6 +794,8 @@ def test_expected_output_pull_request_v2_7(
                 "run-tests": "true",
                 "docs-build": "true",
                 "docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
+                "skip-pre-commits": 
"check-provider-yaml-valid,identity,lint-helm-chart,"
+                "mypy-dev,mypy-docs,mypy-providers,ts-compile-format-lint-www",
                 "upgrade-to-newer-dependencies": "false",
                 "skip-provider-tests": "true",
                 "parallel-test-types-list-as-string": "Always",
@@ -776,6 +824,7 @@ def test_expected_output_pull_request_v2_7(
                 "cncf.kubernetes common.sql facebook google hashicorp 
microsoft.azure "
                 "microsoft.mssql mysql openlineage oracle postgres "
                 "presto salesforce samba sftp ssh trino",
+                "skip-pre-commits": 
"identity,mypy-dev,mypy-docs,ts-compile-format-lint-www",
                 "run-kubernetes-tests": "true",
                 "upgrade-to-newer-dependencies": "false",
                 "skip-provider-tests": "false",
@@ -804,6 +853,8 @@ def test_expected_output_pull_request_v2_7(
                 "run-tests": "true",
                 "docs-build": "true",
                 "docs-list-as-string": "apache-airflow",
+                "skip-pre-commits": 
"check-provider-yaml-valid,identity,lint-helm-chart,mypy-dev,"
+                "mypy-docs,mypy-providers,ts-compile-format-lint-www",
                 "run-kubernetes-tests": "false",
                 "upgrade-to-newer-dependencies": "false",
                 "skip-provider-tests": "true",
@@ -823,6 +874,8 @@ def test_expected_output_pull_request_v2_7(
                 "run-tests": "true",
                 "docs-build": "true",
                 "docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
+                "skip-pre-commits": 
"check-provider-yaml-valid,identity,lint-helm-chart,mypy-dev,"
+                "mypy-docs,mypy-providers,ts-compile-format-lint-www",
                 "run-kubernetes-tests": "false",
                 "upgrade-to-newer-dependencies": "false",
                 "skip-provider-tests": "false",
@@ -842,6 +895,8 @@ def test_expected_output_pull_request_v2_7(
                 "run-tests": "true",
                 "docs-build": "true",
                 "docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
+                "skip-pre-commits": 
"check-provider-yaml-valid,identity,lint-helm-chart,mypy-dev,"
+                "mypy-docs,mypy-providers,ts-compile-format-lint-www",
                 "run-kubernetes-tests": "false",
                 "upgrade-to-newer-dependencies": "false",
                 "skip-provider-tests": "false",
@@ -882,6 +937,7 @@ def test_expected_output_pull_request_target(
                 "run-tests": "true",
                 "docs-build": "true",
                 "docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
+                "skip-pre-commits": "identity",
                 "upgrade-to-newer-dependencies": "true",
                 "parallel-test-types-list-as-string": 
ALL_CI_SELECTIVE_TEST_TYPES,
             },
@@ -900,6 +956,8 @@ def test_expected_output_pull_request_target(
                 "needs-helm-tests": "false",
                 "run-tests": "true",
                 "docs-build": "true",
+                "skip-pre-commits": 
"check-airflow-provider-compatibility,check-extra-packages-references,"
+                
"check-provider-yaml-valid,identity,lint-helm-chart,mypy-providers",
                 "docs-list-as-string": "apache-airflow docker-stack",
                 "upgrade-to-newer-dependencies": "true",
                 "parallel-test-types-list-as-string": "API Always 
BranchExternalPython BranchPythonVenv "
@@ -921,6 +979,7 @@ def test_expected_output_pull_request_target(
                 "needs-helm-tests": "true",
                 "run-tests": "true",
                 "docs-build": "true",
+                "skip-pre-commits": "identity",
                 "docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
                 "upgrade-to-newer-dependencies": "true",
                 "parallel-test-types-list-as-string": 
ALL_CI_SELECTIVE_TEST_TYPES,
@@ -972,6 +1031,7 @@ def 
test_no_commit_provided_trigger_full_build_for_any_event_type(github_event):
             "needs-helm-tests": "true",
             "run-tests": "true",
             "docs-build": "true",
+            "skip-pre-commits": "identity",
             "upgrade-to-newer-dependencies": "true"
             if github_event in [GithubEvents.PUSH, GithubEvents.SCHEDULE]
             else "false",

(airflow) branch main updated: Optimize Static Checks job for most regular PRs (#35461)

Reply via email to