This is an automated email from the ASF dual-hosted git repository.
eladkal pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push:
new 6d3186531a Optimize Static Checks job for most regular PRs (#35461)
6d3186531a is described below
commit 6d3186531af47c90c607faf72e27cb1cf3e33fe8
Author: Jarek Potiuk <[email protected]>
AuthorDate: Tue Nov 7 17:23:58 2023 +0100
Optimize Static Checks job for most regular PRs (#35461)
By default static checks run in the CI with `--all-files` to make
sure that there are no side-effects coming from only running the
pre-commit checks on a subset of files. The complete `--all-files`
static check suite on self-hosted runners take ~ 9 minutes and
for public runners take ~ 15 minutes so there is room for optimisation
now that Tests are running faster than that in many cases.
While we cannot run static checks on only the files changed in the
PR (that would lead to many false-positives) we can disable whole
pre-commit checks in case the incoming PRs do not contain certain
files modified. For example we we can skip mypy-providers
when no provider files changed, or we can skip mypy-core when no
core files changed or we can skip helm linting in case chart files
have not changed in the incoming PR.
This PR implements selective check rules that will skip some of the
longest running pre-commit checks in case it looks like the checks
are not needed.
Still - all tests will be run when "full tests needed" flag is set
(i.e. when build scripts change or when we detect structural/package
changes in the project or when "full tests needed" label is set for
the PR - and all the static checks will also continue running in
"canary" builds, so we will be able to catch and correct any rules
that will lead to skipping some of the static checks when they should
be run in fact.
Also the cache for pre-commits is renamed to be "common" - currently
the "basic" cache is different than full cache but since cache is
only really uploaded by the "canary" builds, it's quite ok to
have a common "full"cache - it will get faster to retrieve it by
regular PRs.
---
.github/workflows/ci.yml | 11 +-
dev/breeze/SELECTIVE_CHECKS.md | 78 ++++++++---
.../airflow_breeze/commands/developer_commands.py | 5 +
.../src/airflow_breeze/utils/selective_checks.py | 156 ++++++++++++++++++---
dev/breeze/tests/test_selective_checks.py | 70 ++++++++-
5 files changed, 269 insertions(+), 51 deletions(-)
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 0af32baf10..01cf502daa 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -652,9 +652,9 @@ jobs:
with:
path: ~/.cache/pre-commit
# yamllint disable-line rule:line-length
- key:
"pre-commit-full-${{steps.breeze.outputs.host-python-version}}-${{
hashFiles('.pre-commit-config.yaml') }}"
+ key: "pre-commit-${{steps.breeze.outputs.host-python-version}}-${{
hashFiles('.pre-commit-config.yaml') }}"
restore-keys: |
- pre-commit-full-${{steps.breeze.outputs.host-python-version}}-
+ pre-commit-${{steps.breeze.outputs.host-python-version}}-
- name: "Static checks"
run: breeze static-checks --all-files --show-diff-on-failure --color
always --initialize-environment
env:
@@ -700,12 +700,11 @@ jobs:
with:
path: ~/.cache/pre-commit
# yamllint disable-line rule:line-length
- key:
"pre-commit-basic-${{steps.breeze.outputs.host-python-version}}-${{
hashFiles('.pre-commit-config.yaml') }}"
+ key: "pre-commit-${{steps.breeze.outputs.host-python-version}}-${{
hashFiles('.pre-commit-config.yaml') }}"
restore-keys: "\
- pre-commit-full-${{steps.breeze.outputs.host-python-version}}-\
+ pre-commit-${{steps.breeze.outputs.host-python-version}}-\
${{ hashFiles('.pre-commit-config.yaml') }}\n
- pre-commit-basic-${{steps.breeze.outputs.host-python-version}}-\n
- pre-commit-full-${{steps.breeze.outputs.host-python-version}}-"
+ pre-commit-${{steps.breeze.outputs.host-python-version}}-"
- name: Fetch incoming commit ${{ github.sha }} with its parent
uses: actions/checkout@v4
with:
diff --git a/dev/breeze/SELECTIVE_CHECKS.md b/dev/breeze/SELECTIVE_CHECKS.md
index b3bff154ae..2f7d962041 100644
--- a/dev/breeze/SELECTIVE_CHECKS.md
+++ b/dev/breeze/SELECTIVE_CHECKS.md
@@ -30,14 +30,18 @@ kind of changes. The logic implemented reflects the
internal architecture of Air
and it helps to keep down both the usage of jobs in GitHub Actions and CI
feedback time to
contributors in case of simpler changes.
+## Groups of files that selective check make decisions on
+
We have the following Groups of files for CI that determine which tests are
run:
* `Environment files` - if any of those changes, that forces 'full tests
needed' mode, because changes
there might simply change the whole environment of what is going on in CI
(Container image, dependencies)
-* `Python and Javascript production files` - this area is useful in CodeQL
Security scanning - if any of
- the python or javascript files for airflow "production" changed, this means
that the security scans should run
-* `API tests and codegen files` - those are OpenAPI definition files that
impact Open API specification and
- determine that we should run dedicated API tests.
+* `Python production files` and `Javascript production files` - this area is
useful in CodeQL Security scanning
+ - if any of the python or javascript files for airflow "production" changed,
this means that the security
+ scans should run
+* `Always test files` - Files that belong to "Always" run tests.
+* `API tests files` and `Codegen test files` - those are OpenAPI definition
files that impact
+ Open API specification and determine that we should run dedicated API tests.
* `Helm files` - change in those files impacts helm "rendering" tests -
`chart` folder and `helm_tests` folder.
* `Setup files` - change in the setup files indicates that we should run
`upgrade to newer dependencies` -
setup.* files, pyproject.toml, generated dependencies files in `generated`
folder
@@ -51,24 +55,25 @@ We have the following Groups of files for CI that determine
which tests are run:
* `All Python files` - if none of the Python file changed, that indicates that
we should not run unit tests
* `All source files` - if none of the sources change, that indicates that we
should probably not build
an image and run any image-based static checks
+* `All Airflow Python files` - files that are checked by `mypy-core` static
checks
+* `All Providers Python files` - files that are checked by `mypy-providers`
static checks
+* `All Dev Python files` - files that are checked by `mypy-dev` static checks
+* `All Docs Python files` - files that are checked by `mypy-docs` static checks
+* `All Provider Yaml files` - all provider yaml files
-We have the following unit test types that can be selectively disabled/enabled
based on the
-content of the incoming PR. Usually they are limited to a sub-folder of the
"tests" folder but there
-are some exceptions. You can read more about those in `TESTING.rst
<TESTING.rst>`.
-
-We also have `Integration` tests that are running Integration tests with
external software that is run
-via `--integration` flag in `breeze` environment.
-* `Integration` - tests that require external integration images running in
docker-compose
+We have a number of `TEST_TYPES` that can be selectively disabled/enabled
based on the
+content of the incoming PR. Usually they are limited to a sub-folder of the
"tests" folder but there
+are some exceptions. You can read more about those in `TESTING.rst
<TESTING.rst>`. Those types
+are determined by selective checks and are used to run `DB` and `Non-DB` tests.
-Even if the types are separated, In case they share the same backend
version/python version, they are
-run sequentially in the same job, on the same CI machine. Each of them in a
separate `docker run` command
-and with additional docker cleaning between the steps to not fall into the
trap of exceeding resource
-usage in one big test run, but also not to increase the number of jobs per
each Pull Request.
+The `DB` tests inside each `TEST_TYPE` are run sequentially (because they use
DB as state) while `TEST_TYPES`
+are run in parallel - each within separate docker-compose project. The
`Non-DB` tests are all executed
+together using `pytest-xdist` (pytest-xdist distributes the tests among
parallel workers).
-The logic implements the following rules:
+## Selective check decision rules
-* `Full tests mode` is enabled when the event is PUSH, or SCHEDULE or we miss
commit info or any of the
+* `Full tests` case is enabled when the event is PUSH, or SCHEDULE or we miss
commit info or any of the
important environment files (setup.py, setup.cfg, provider.yaml, Dockerfile,
build scripts) changed or
when `full tests needed` label is set. That enables all matrix combinations
of variables (representative)
and all possible test type. No further checks are performed.
@@ -78,8 +83,8 @@ The logic implements the following rules:
are enabled if any of the relevant files have been changed.
* `Helm` tests are run only if relevant files have been changed and if current
branch is `main`.
* If no Source files are changed - no tests are run and no further rules below
are checked.
-* `Image building` is enabled if either test are run, docs are build or
kubernetes tests are run. All those
- need `CI` or `PROD` images to be built.
+* `CI Image building` is enabled if either test are run, docs are build.
+* `PROD Image building` is enabled when kubernetes tests are run.
* In case of `Providers` test in regular PRs, additional check is done in
order to determine which
providers are affected and the actual selection is made based on that:
* if directly provider code is changed (either in the provider, test or
system tests) then this provider
@@ -94,7 +99,7 @@ The logic implements the following rules:
* If there are no files left in sources after matching the test types and
Kubernetes files,
then apparently some Core/Other files have been changed. This automatically
adds all test
types to execute. This is done because changes in core might impact all the
other test types.
-* if `Image building` is disabled, only basic pre-commits are enabled - no
'image-depending` pre-commits
+* if `CI Image building` is disabled, only basic pre-commits are enabled - no
'image-depending` pre-commits
are enabled.
* If there are some setup files changed, `upgrade to newer dependencies` is
enabled.
* If docs are build, the `docs-list-as-string` will determine which docs
packages to build. This is based on
@@ -103,16 +108,49 @@ The logic implements the following rules:
changed, also providers docs are built because all providers depend on
airflow docs. If any of the docs
build python files changed or when build is "canary" type in main - all docs
packages are built.
+## Skipping pre-commits (Static checks)
+
+Our CI always run pre-commit checks with `--all-files` flag. This is in order
to avoid cases where
+different check results are run when only subset of files is used. This has an
effect that the pre-commit
+tests take a long time to run when all of them are run. Selective checks allow
to save a lot of time
+for those tests in regular PRs of contributors by smart detection of which
pre-commits should be skipped
+when some files are not changed. Those are the rules implemented:
+
+* The `identity` check is always skipped (saves space to display all changed
files in CI)
+* The provider specific checks are skipped when builds are running in v2_*
branches (we do not build
+ providers from those branches. Those are the checks skipped in this case:
+ * check-airflow-provider-compatibility
+ * check-extra-packages-references
+ * check-provider-yaml-valid
+ * lint-helm-chart
+ * mypy-providers
+* If "full tests" mode is detected, no more pre-commits are skipped - we run
all of them
+* The following checks are skipped if those files are not changed:
+ * if no `All Providers Python files` changed - `mypy-providers` check is
skipped
+ * if no `All Airflow Python files` changed - `mypy-core` check is skipped
+ * if no `All Docs Python files` changed - `mypy-docs` check is skipped
+ * if no `All Dev Python files` changed - `mypy-dev` check is skipped
+ * if no `WWW files` changed - `ts-compile-format-lint-www` check is skipped
+ * if no `All Python files` changed - `flynt` check is skipped
+ * if no `Helm files` changed - `lint-helm-chart` check is skipped
+ * if no `All Providers Python files` and no `All Providers Yaml files` are
changed -
+ `check-provider-yaml-valid` check is skipped
+
+## Suspended providers
The selective checks will fail in PR if it contains changes to a suspended
provider unless you set the
label `allow suspended provider changes` in the PR. This is to prevent
accidental changes to suspended
providers.
+
+## Selective check outputs
+
The selective check outputs available are described below. In case of
`list-as-string` values,
empty string means `everything`, where lack of the output means `nothing` and
list elements are
separated by spaces. This is to accommodate for the wau how outputs of this
kind can be easily used by
Github Actions to pass the list of parameters to a command to execute
+
| Output | Meaning of the output
| Example value
| List as string |
|------------------------------------|---------------------------------------------------------------------------------------------------------|-----------------------------------------------------|----------------|
| affected-providers-list-as-string | List of providers affected when they
are selectively affected. | airbyte
http | * |
diff --git a/dev/breeze/src/airflow_breeze/commands/developer_commands.py
b/dev/breeze/src/airflow_breeze/commands/developer_commands.py
index 60d3138e68..69e8a6edd7 100644
--- a/dev/breeze/src/airflow_breeze/commands/developer_commands.py
+++ b/dev/breeze/src/airflow_breeze/commands/developer_commands.py
@@ -625,6 +625,11 @@ def static_checks(
command_to_execute.extend(file)
if precommit_args:
command_to_execute.extend(precommit_args)
+ skip_checks = os.environ.get("SKIP")
+ if skip_checks and skip_checks != "identity":
+ get_console().print("\nThis static check run skips those checks:\n")
+ get_console().print(skip_checks.split(","))
+ get_console().print()
env = os.environ.copy()
env["GITHUB_REPOSITORY"] = github_repository
static_checks_result = run_command(
diff --git a/dev/breeze/src/airflow_breeze/utils/selective_checks.py
b/dev/breeze/src/airflow_breeze/utils/selective_checks.py
index 214dbb63a9..c8b5eaf7bf 100644
--- a/dev/breeze/src/airflow_breeze/utils/selective_checks.py
+++ b/dev/breeze/src/airflow_breeze/utils/selective_checks.py
@@ -88,6 +88,11 @@ class FileGroupForCi(Enum):
KUBERNETES_FILES = "kubernetes_files"
ALL_PYTHON_FILES = "all_python_files"
ALL_SOURCE_FILES = "all_sources_for_tests"
+ ALL_AIRFLOW_PYTHON_FILES = "all_airflow_python_files"
+ ALL_PROVIDERS_PYTHON_FILES = "all_provider_python_files"
+ ALL_DEV_PYTHON_FILES = "all_dev_python_files"
+ ALL_PROVIDER_YAML_FILES = "all_provider_yaml_files"
+ ALL_DOCS_PYTHON_FILES = "all_docs_python_files"
T = TypeVar("T", FileGroupForCi, SelectiveUnitTestTypes)
@@ -165,7 +170,21 @@ CI_FILE_GROUP_MATCHES = HashableDict(
r"^tests/system/providers/cncf/kubernetes/",
],
FileGroupForCi.ALL_PYTHON_FILES: [
- r"\.py$",
+ r".*\.py$",
+ ],
+ FileGroupForCi.ALL_AIRFLOW_PYTHON_FILES: [
+ r".*\.py$",
+ ],
+ FileGroupForCi.ALL_PROVIDERS_PYTHON_FILES: [
+ r"^airflow/providers/.*\.py$",
+ r"^tests/providers/.*\.py$",
+ r"^tests/system/providers/.*\.py$",
+ ],
+ FileGroupForCi.ALL_DOCS_PYTHON_FILES: [
+ r"^docs/.*\.py$",
+ ],
+ FileGroupForCi.ALL_DEV_PYTHON_FILES: [
+ r"^dev/.*\.py$",
],
FileGroupForCi.ALL_SOURCE_FILES: [
r"^.pre-commit-config.yaml$",
@@ -180,9 +199,27 @@ CI_FILE_GROUP_MATCHES = HashableDict(
FileGroupForCi.ALWAYS_TESTS_FILES: [
r"^tests/always/",
],
+ FileGroupForCi.ALL_PROVIDER_YAML_FILES: [
+ r".*/provider\.yaml$",
+ ],
}
)
+CI_FILE_GROUP_EXCLUDES = HashableDict(
+ {
+ FileGroupForCi.ALL_AIRFLOW_PYTHON_FILES: [
+ r"^.*/.*_vendor/.*",
+ r"^airflow/migrations/.*",
+ r"^airflow/providers/.*",
+ r"^dev/.*",
+ r"^docs/.*",
+ r"^provider_packages/.*",
+ r"^tests/providers/.*",
+ r"^tests/system/providers/.*",
+ r"^tests/dags/test_imports.py",
+ ]
+ }
+)
TEST_TYPE_MATCHES = HashableDict(
{
@@ -215,6 +252,8 @@ TEST_TYPE_MATCHES = HashableDict(
}
)
+TEST_TYPE_EXCLUDES = HashableDict({})
+
def find_provider_affected(changed_file: str, include_docs: bool) -> str |
None:
file_path = AIRFLOW_SOURCES_ROOT / changed_file
@@ -372,7 +411,9 @@ class SelectiveChecks:
if self._github_event in [GithubEvents.PUSH, GithubEvents.SCHEDULE,
GithubEvents.WORKFLOW_DISPATCH]:
get_console().print(f"[warning]Full tests needed because event is
{self._github_event}[/]")
return True
- if self._matching_files(FileGroupForCi.ENVIRONMENT_FILES,
CI_FILE_GROUP_MATCHES):
+ if self._matching_files(
+ FileGroupForCi.ENVIRONMENT_FILES, CI_FILE_GROUP_MATCHES,
CI_FILE_GROUP_EXCLUDES
+ ):
get_console().print("[warning]Running everything because env files
changed[/]")
return True
if FULL_TESTS_NEEDED_LABEL in self._pr_labels:
@@ -487,16 +528,27 @@ class SelectiveChecks:
)
return " ".join(short_combo_titles)
- def _match_files_with_regexps(self, matched_files, regexps):
+ def _match_files_with_regexps(self, matched_files, matching_regexps):
for file in self._files:
- if any(re.match(regexp, file) for regexp in regexps):
+ if any(re.match(regexp, file) for regexp in matching_regexps):
matched_files.append(file)
+ def _exclude_files_with_regexps(self, matched_files, exclude_regexps):
+ for file in self._files:
+ if any(re.match(regexp, file) for regexp in exclude_regexps):
+ if file in matched_files:
+ matched_files.remove(file)
+
@lru_cache(maxsize=None)
- def _matching_files(self, match_group: T, match_dict: dict[T, list[str]])
-> list[str]:
+ def _matching_files(
+ self, match_group: T, match_dict: dict[T, list[str]], exclude_dict:
dict[T, list[str]]
+ ) -> list[str]:
matched_files: list[str] = []
- regexps = match_dict[match_group]
- self._match_files_with_regexps(matched_files, regexps)
+ match_regexps = match_dict[match_group]
+ excluded_regexps = exclude_dict.get(match_group)
+ self._match_files_with_regexps(matched_files, match_regexps)
+ if excluded_regexps:
+ self._exclude_files_with_regexps(matched_files, excluded_regexps)
count = len(matched_files)
if count > 0:
get_console().print(f"[warning]{match_group} matched {count}
files.[/]")
@@ -509,7 +561,7 @@ class SelectiveChecks:
if self.full_tests_needed:
get_console().print(f"[warning]{source_area} enabled because we
are running everything[/]")
return True
- matched_files = self._matching_files(source_area,
CI_FILE_GROUP_MATCHES)
+ matched_files = self._matching_files(source_area,
CI_FILE_GROUP_MATCHES, CI_FILE_GROUP_EXCLUDES)
if matched_files:
get_console().print(
f"[warning]{source_area} enabled because it matched
{len(matched_files)} changed files[/]"
@@ -577,7 +629,7 @@ class SelectiveChecks:
def _select_test_type_if_matching(
self, test_types: set[str], test_type: SelectiveUnitTestTypes
) -> list[str]:
- matched_files = self._matching_files(test_type, TEST_TYPE_MATCHES)
+ matched_files = self._matching_files(test_type, TEST_TYPE_MATCHES,
TEST_TYPE_EXCLUDES)
count = len(matched_files)
if count > 0:
test_types.add(test_type.value)
@@ -614,10 +666,18 @@ class SelectiveChecks:
self._select_test_type_if_matching(candidate_test_types,
SelectiveUnitTestTypes.API)
)
- kubernetes_files =
self._matching_files(FileGroupForCi.KUBERNETES_FILES, CI_FILE_GROUP_MATCHES)
- system_test_files =
self._matching_files(FileGroupForCi.SYSTEM_TEST_FILES, CI_FILE_GROUP_MATCHES)
- all_source_files =
self._matching_files(FileGroupForCi.ALL_SOURCE_FILES, CI_FILE_GROUP_MATCHES)
- test_always_files =
self._matching_files(FileGroupForCi.ALWAYS_TESTS_FILES, CI_FILE_GROUP_MATCHES)
+ kubernetes_files = self._matching_files(
+ FileGroupForCi.KUBERNETES_FILES, CI_FILE_GROUP_MATCHES,
CI_FILE_GROUP_EXCLUDES
+ )
+ system_test_files = self._matching_files(
+ FileGroupForCi.SYSTEM_TEST_FILES, CI_FILE_GROUP_MATCHES,
CI_FILE_GROUP_EXCLUDES
+ )
+ all_source_files = self._matching_files(
+ FileGroupForCi.ALL_SOURCE_FILES, CI_FILE_GROUP_MATCHES,
CI_FILE_GROUP_EXCLUDES
+ )
+ test_always_files = self._matching_files(
+ FileGroupForCi.ALWAYS_TESTS_FILES, CI_FILE_GROUP_MATCHES,
CI_FILE_GROUP_EXCLUDES
+ )
remaining_files = (
set(all_source_files)
- set(matched_files)
@@ -710,7 +770,12 @@ class SelectiveChecks:
@cached_property
def upgrade_to_newer_dependencies(self) -> bool:
return (
- len(self._matching_files(FileGroupForCi.SETUP_FILES,
CI_FILE_GROUP_MATCHES)) > 0
+ len(
+ self._matching_files(
+ FileGroupForCi.SETUP_FILES, CI_FILE_GROUP_MATCHES,
CI_FILE_GROUP_EXCLUDES
+ )
+ )
+ > 0
or self._github_event in [GithubEvents.PUSH, GithubEvents.SCHEDULE]
or UPGRADE_TO_NEWER_DEPENDENCIES_LABEL in self._pr_labels
)
@@ -752,12 +817,63 @@ class SelectiveChecks:
@cached_property
def skip_pre_commits(self) -> str:
- return (
- "identity"
- if self._default_branch == "main"
- else "identity,check-airflow-provider-compatibility,"
- "check-extra-packages-references,check-provider-yaml-valid"
- )
+ pre_commits_to_skip = set()
+ pre_commits_to_skip.add("identity")
+ if self._default_branch != "main":
+ # Skip those tests on all "release" branches
+ pre_commits_to_skip.update(
+ (
+ "check-airflow-provider-compatibility",
+ "check-extra-packages-references",
+ "check-provider-yaml-valid",
+ "lint-helm-chart",
+ "mypy-providers",
+ )
+ )
+ if self.full_tests_needed:
+ # when full tests are needed, we do not want to skip any checks
and we should
+ # run all the pre-commits just to be sure everything is ok when
some structural changes occurred
+ return ",".join(sorted(pre_commits_to_skip))
+ if not self._matching_files(
+ FileGroupForCi.ALL_PROVIDERS_PYTHON_FILES, CI_FILE_GROUP_MATCHES,
CI_FILE_GROUP_EXCLUDES
+ ):
+ pre_commits_to_skip.add("mypy-providers")
+ if not self._matching_files(
+ FileGroupForCi.ALL_AIRFLOW_PYTHON_FILES, CI_FILE_GROUP_MATCHES,
CI_FILE_GROUP_EXCLUDES
+ ):
+ pre_commits_to_skip.add("mypy-core")
+ if not self._matching_files(
+ FileGroupForCi.ALL_DOCS_PYTHON_FILES, CI_FILE_GROUP_MATCHES,
CI_FILE_GROUP_EXCLUDES
+ ):
+ pre_commits_to_skip.add("mypy-docs")
+ if not self._matching_files(
+ FileGroupForCi.ALL_DEV_PYTHON_FILES, CI_FILE_GROUP_MATCHES,
CI_FILE_GROUP_EXCLUDES
+ ):
+ pre_commits_to_skip.add("mypy-dev")
+ if not self._matching_files(FileGroupForCi.WWW_FILES,
CI_FILE_GROUP_MATCHES, CI_FILE_GROUP_EXCLUDES):
+ pre_commits_to_skip.add("ts-compile-format-lint-www")
+ if not self._matching_files(
+ FileGroupForCi.ALL_PYTHON_FILES, CI_FILE_GROUP_MATCHES,
CI_FILE_GROUP_EXCLUDES
+ ):
+ pre_commits_to_skip.add("flynt")
+ if not self._matching_files(
+ FileGroupForCi.HELM_FILES,
+ CI_FILE_GROUP_MATCHES,
+ CI_FILE_GROUP_EXCLUDES,
+ ):
+ pre_commits_to_skip.add("lint-helm-chart")
+ if not (
+ self._matching_files(
+ FileGroupForCi.ALL_PROVIDER_YAML_FILES, CI_FILE_GROUP_MATCHES,
CI_FILE_GROUP_EXCLUDES
+ )
+ or self._matching_files(
+ FileGroupForCi.ALL_PROVIDERS_PYTHON_FILES,
CI_FILE_GROUP_MATCHES, CI_FILE_GROUP_EXCLUDES
+ )
+ ):
+ # only skip provider validation if none of the provider.yaml and
provider
+ # python files changed because validation also walks through all
the provider python files
+ pre_commits_to_skip.add("check-provider-yaml-valid")
+ return ",".join(sorted(pre_commits_to_skip))
@cached_property
def skip_provider_tests(self) -> bool:
diff --git a/dev/breeze/tests/test_selective_checks.py
b/dev/breeze/tests/test_selective_checks.py
index 7426e0a558..6857636d04 100644
--- a/dev/breeze/tests/test_selective_checks.py
+++ b/dev/breeze/tests/test_selective_checks.py
@@ -101,6 +101,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"run-tests": "false",
"run-amazon-tests": "false",
"docs-build": "false",
+ "skip-pre-commits":
"check-provider-yaml-valid,flynt,identity,lint-helm-chart,mypy-core,mypy-dev,"
+ "mypy-docs,mypy-providers,ts-compile-format-lint-www",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string": None,
},
@@ -122,6 +124,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"run-tests": "true",
"run-amazon-tests": "false",
"docs-build": "true",
+ "skip-pre-commits":
"check-provider-yaml-valid,identity,lint-helm-chart,mypy-dev,"
+ "mypy-docs,mypy-providers,ts-compile-format-lint-www",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string": "API Always",
},
@@ -143,6 +147,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"run-tests": "true",
"run-amazon-tests": "false",
"docs-build": "true",
+ "skip-pre-commits":
"check-provider-yaml-valid,identity,lint-helm-chart,mypy-dev,"
+ "mypy-docs,mypy-providers,ts-compile-format-lint-www",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string": "Always Operators",
},
@@ -168,6 +174,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"run-tests": "true",
"run-amazon-tests": "true",
"docs-build": "true",
+ "skip-pre-commits":
"identity,lint-helm-chart,mypy-dev,mypy-docs,"
+ "ts-compile-format-lint-www",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string": "API Always
Providers[amazon] "
"Providers[common.sql,openlineage,pgvector,postgres]
Providers[google]",
@@ -190,6 +198,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"run-tests": "true",
"run-amazon-tests": "false",
"docs-build": "false",
+ "skip-pre-commits":
"identity,lint-helm-chart,mypy-core,mypy-dev,mypy-docs,"
+ "ts-compile-format-lint-www",
"run-kubernetes-tests": "false",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string": "Always
Providers[apache.beam] Providers[google]",
@@ -212,6 +222,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"run-tests": "false",
"run-amazon-tests": "false",
"docs-build": "true",
+ "skip-pre-commits":
"check-provider-yaml-valid,flynt,identity,lint-helm-chart,mypy-core,mypy-dev,"
+ "mypy-docs,mypy-providers,ts-compile-format-lint-www",
"run-kubernetes-tests": "false",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string": None,
@@ -238,6 +250,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"run-tests": "true",
"run-amazon-tests": "true",
"docs-build": "true",
+ "skip-pre-commits":
"identity,mypy-core,mypy-dev,mypy-docs,ts-compile-format-lint-www",
"run-kubernetes-tests": "true",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string": "Always
Providers[amazon] "
@@ -267,6 +280,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"run-tests": "true",
"run-amazon-tests": "true",
"docs-build": "true",
+ "skip-pre-commits":
"identity,mypy-core,mypy-dev,mypy-docs,ts-compile-format-lint-www",
"run-kubernetes-tests": "true",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string": "Always "
@@ -295,6 +309,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"run-tests": "true",
"run-amazon-tests": "false",
"docs-build": "true",
+ "skip-pre-commits":
"identity,mypy-core,mypy-dev,mypy-docs,ts-compile-format-lint-www",
"run-kubernetes-tests": "true",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string": "Always
Providers[airbyte,http]",
@@ -321,6 +336,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"needs-helm-tests": "true",
"run-tests": "true",
"docs-build": "true",
+ "skip-pre-commits":
"check-provider-yaml-valid,identity,mypy-dev,"
+ "mypy-docs,mypy-providers,ts-compile-format-lint-www",
"run-amazon-tests": "false",
"run-kubernetes-tests": "true",
"upgrade-to-newer-dependencies": "false",
@@ -345,6 +362,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"run-tests": "true",
"run-amazon-tests": "true",
"docs-build": "true",
+ "full-tests-needed": "true",
+ "skip-pre-commits": "identity",
"upgrade-to-newer-dependencies": "true",
"parallel-test-types-list-as-string":
ALL_CI_SELECTIVE_TEST_TYPES,
},
@@ -367,6 +386,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"run-tests": "true",
"run-amazon-tests": "true",
"docs-build": "true",
+ "full-tests-needed": "true",
+ "skip-pre-commits": "identity",
"upgrade-to-newer-dependencies": "true",
"parallel-test-types-list-as-string":
ALL_CI_SELECTIVE_TEST_TYPES,
},
@@ -388,6 +409,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"needs-helm-tests": "false",
"run-tests": "true",
"docs-build": "true",
+ "skip-pre-commits":
"identity,lint-helm-chart,mypy-core,mypy-dev,mypy-docs,ts-compile-format-lint-www",
"run-kubernetes-tests": "false",
"upgrade-to-newer-dependencies": "false",
"run-amazon-tests": "true",
@@ -411,6 +433,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"run-tests": "true",
"run-amazon-tests": "false",
"docs-build": "false",
+ "skip-pre-commits":
"identity,lint-helm-chart,mypy-core,mypy-dev,mypy-docs,ts-compile-format-lint-www",
"run-kubernetes-tests": "false",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string": "Always
Providers[airbyte,http]",
@@ -433,6 +456,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str,
str], stderr: str):
"run-tests": "true",
"run-amazon-tests": "true",
"docs-build": "true",
+ "skip-pre-commits":
"identity,lint-helm-chart,mypy-core,mypy-dev,mypy-docs,ts-compile-format-lint-www",
"run-kubernetes-tests": "false",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string": "Always
Providers[amazon] "
@@ -501,6 +525,7 @@ def test_expected_output_pull_request_main(
"docs-build": "true",
"docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
"full-tests-needed": "true",
+ "skip-pre-commits": "identity",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string":
ALL_CI_SELECTIVE_TEST_TYPES,
},
@@ -527,6 +552,7 @@ def test_expected_output_pull_request_main(
"docs-build": "true",
"docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
"full-tests-needed": "true",
+ "skip-pre-commits": "identity",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string":
ALL_CI_SELECTIVE_TEST_TYPES,
},
@@ -551,6 +577,7 @@ def test_expected_output_pull_request_main(
"docs-build": "true",
"docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
"full-tests-needed": "true",
+ "skip-pre-commits": "identity",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string":
ALL_CI_SELECTIVE_TEST_TYPES,
},
@@ -562,7 +589,7 @@ def test_expected_output_pull_request_main(
pytest.param(
("INTHEWILD.md",),
("full tests needed",),
- "v2-3-stable",
+ "v2-7-stable",
{
"affected-providers-list-as-string":
ALL_PROVIDERS_AFFECTED,
"all-python-versions": "['3.8', '3.9', '3.10', '3.11']",
@@ -575,13 +602,17 @@ def test_expected_output_pull_request_main(
"docs-build": "true",
"docs-list-as-string": "apache-airflow docker-stack",
"full-tests-needed": "true",
+ "skip-pre-commits": "check-airflow-provider-compatibility,"
+
"check-extra-packages-references,check-provider-yaml-valid,identity,"
+ "lint-helm-chart,mypy-providers",
"skip-provider-tests": "true",
"upgrade-to-newer-dependencies": "false",
"parallel-test-types-list-as-string": "API Always
BranchExternalPython "
"BranchPythonVenv CLI Core ExternalPython Operators Other
PlainAsserts "
"PythonVenv Serialization WWW",
},
- id="Everything should run except Providers when full tests are
needed for non-main branch",
+ id="Everything should run except Providers and lint pre-commit
"
+ "when full tests are needed for non-main branch",
)
),
],
@@ -617,6 +648,9 @@ def test_expected_output_full_tests_needed(
"docs-build": "false",
"docs-list-as-string": None,
"full-tests-needed": "false",
+ "skip-pre-commits":
"check-airflow-provider-compatibility,check-extra-packages-references,"
+ "check-provider-yaml-valid,flynt,identity,lint-helm-chart,"
+
"mypy-core,mypy-dev,mypy-docs,mypy-providers,ts-compile-format-lint-www",
"upgrade-to-newer-dependencies": "false",
"skip-provider-tests": "true",
"parallel-test-types-list-as-string": None,
@@ -641,13 +675,16 @@ def test_expected_output_full_tests_needed(
"docs-build": "true",
"docs-list-as-string": "apache-airflow docker-stack",
"full-tests-needed": "false",
+ "skip-pre-commits":
"check-airflow-provider-compatibility,check-extra-packages-references,"
+ "check-provider-yaml-valid,identity,lint-helm-chart,"
+
"mypy-core,mypy-dev,mypy-docs,mypy-providers,ts-compile-format-lint-www",
"run-kubernetes-tests": "true",
"upgrade-to-newer-dependencies": "false",
"skip-provider-tests": "true",
"parallel-test-types-list-as-string": "Always",
},
- id="No Helm tests, No providers should run if only chart/providers
changed in non-main "
- "but PROD image should be built",
+ id="No Helm tests, No providers no lint charts, should run if "
+ "only chart/providers changed in non-main but PROD image should be
built",
),
pytest.param(
(
@@ -669,6 +706,9 @@ def test_expected_output_full_tests_needed(
"docs-build": "true",
"docs-list-as-string": "apache-airflow docker-stack",
"full-tests-needed": "false",
+ "skip-pre-commits":
"check-airflow-provider-compatibility,check-extra-packages-references,"
+ "check-provider-yaml-valid,identity,lint-helm-chart,"
+ "mypy-dev,mypy-docs,mypy-providers,ts-compile-format-lint-www",
"run-kubernetes-tests": "true",
"upgrade-to-newer-dependencies": "false",
"skip-provider-tests": "true",
@@ -695,10 +735,14 @@ def test_expected_output_full_tests_needed(
"run-kubernetes-tests": "false",
"upgrade-to-newer-dependencies": "false",
"skip-provider-tests": "true",
+ "skip-pre-commits":
"check-airflow-provider-compatibility,check-extra-packages-references,"
+ "check-provider-yaml-valid,identity,lint-helm-chart,"
+ "mypy-dev,mypy-docs,mypy-providers,ts-compile-format-lint-www",
"parallel-test-types-list-as-string": "API Always
BranchExternalPython BranchPythonVenv "
"CLI Core ExternalPython Operators Other PlainAsserts
PythonVenv Serialization WWW",
},
- id="All tests except Providers should run if core file changed in
non-main branch",
+ id="All tests except Providers and helm lint pre-commit "
+ "should run if core file changed in non-main branch",
),
],
)
@@ -731,6 +775,8 @@ def test_expected_output_pull_request_v2_7(
"docs-build": "false",
"docs-list-as-string": None,
"upgrade-to-newer-dependencies": "false",
+ "skip-pre-commits":
"check-provider-yaml-valid,flynt,identity,lint-helm-chart,"
+
"mypy-core,mypy-dev,mypy-docs,mypy-providers,ts-compile-format-lint-www",
"skip-provider-tests": "true",
"parallel-test-types-list-as-string": None,
},
@@ -748,6 +794,8 @@ def test_expected_output_pull_request_v2_7(
"run-tests": "true",
"docs-build": "true",
"docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
+ "skip-pre-commits":
"check-provider-yaml-valid,identity,lint-helm-chart,"
+ "mypy-dev,mypy-docs,mypy-providers,ts-compile-format-lint-www",
"upgrade-to-newer-dependencies": "false",
"skip-provider-tests": "true",
"parallel-test-types-list-as-string": "Always",
@@ -776,6 +824,7 @@ def test_expected_output_pull_request_v2_7(
"cncf.kubernetes common.sql facebook google hashicorp
microsoft.azure "
"microsoft.mssql mysql openlineage oracle postgres "
"presto salesforce samba sftp ssh trino",
+ "skip-pre-commits":
"identity,mypy-dev,mypy-docs,ts-compile-format-lint-www",
"run-kubernetes-tests": "true",
"upgrade-to-newer-dependencies": "false",
"skip-provider-tests": "false",
@@ -804,6 +853,8 @@ def test_expected_output_pull_request_v2_7(
"run-tests": "true",
"docs-build": "true",
"docs-list-as-string": "apache-airflow",
+ "skip-pre-commits":
"check-provider-yaml-valid,identity,lint-helm-chart,mypy-dev,"
+ "mypy-docs,mypy-providers,ts-compile-format-lint-www",
"run-kubernetes-tests": "false",
"upgrade-to-newer-dependencies": "false",
"skip-provider-tests": "true",
@@ -823,6 +874,8 @@ def test_expected_output_pull_request_v2_7(
"run-tests": "true",
"docs-build": "true",
"docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
+ "skip-pre-commits":
"check-provider-yaml-valid,identity,lint-helm-chart,mypy-dev,"
+ "mypy-docs,mypy-providers,ts-compile-format-lint-www",
"run-kubernetes-tests": "false",
"upgrade-to-newer-dependencies": "false",
"skip-provider-tests": "false",
@@ -842,6 +895,8 @@ def test_expected_output_pull_request_v2_7(
"run-tests": "true",
"docs-build": "true",
"docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
+ "skip-pre-commits":
"check-provider-yaml-valid,identity,lint-helm-chart,mypy-dev,"
+ "mypy-docs,mypy-providers,ts-compile-format-lint-www",
"run-kubernetes-tests": "false",
"upgrade-to-newer-dependencies": "false",
"skip-provider-tests": "false",
@@ -882,6 +937,7 @@ def test_expected_output_pull_request_target(
"run-tests": "true",
"docs-build": "true",
"docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
+ "skip-pre-commits": "identity",
"upgrade-to-newer-dependencies": "true",
"parallel-test-types-list-as-string":
ALL_CI_SELECTIVE_TEST_TYPES,
},
@@ -900,6 +956,8 @@ def test_expected_output_pull_request_target(
"needs-helm-tests": "false",
"run-tests": "true",
"docs-build": "true",
+ "skip-pre-commits":
"check-airflow-provider-compatibility,check-extra-packages-references,"
+
"check-provider-yaml-valid,identity,lint-helm-chart,mypy-providers",
"docs-list-as-string": "apache-airflow docker-stack",
"upgrade-to-newer-dependencies": "true",
"parallel-test-types-list-as-string": "API Always
BranchExternalPython BranchPythonVenv "
@@ -921,6 +979,7 @@ def test_expected_output_pull_request_target(
"needs-helm-tests": "true",
"run-tests": "true",
"docs-build": "true",
+ "skip-pre-commits": "identity",
"docs-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
"upgrade-to-newer-dependencies": "true",
"parallel-test-types-list-as-string":
ALL_CI_SELECTIVE_TEST_TYPES,
@@ -972,6 +1031,7 @@ def
test_no_commit_provided_trigger_full_build_for_any_event_type(github_event):
"needs-helm-tests": "true",
"run-tests": "true",
"docs-build": "true",
+ "skip-pre-commits": "identity",
"upgrade-to-newer-dependencies": "true"
if github_event in [GithubEvents.PUSH, GithubEvents.SCHEDULE]
else "false",