amoghrajesh commented on code in PR #45266: URL: https://github.com/apache/airflow/pull/45266#discussion_r1899294041
########## dev/breeze/src/airflow_breeze/utils/platforms.py: ########## @@ -21,12 +21,12 @@ from pathlib import Path -def get_real_platform(single_platform: str) -> str: +def get_normalized_platform(single_platform: str) -> str: Review Comment: Much better name ########## dev/breeze/doc/ci/02_images.md: ########## @@ -329,22 +335,14 @@ new version of base Python is released. However, occasionally, you might need to rebuild images locally and push them directly to the registries to refresh them. -Every developer can also pull and run images being result of a specific +Every contributor can also pull and run images being result of a specific CI run in GitHub Actions. This is a powerful tool that allows to reproduce CI failures locally, enter the images and fix them much -faster. It is enough to pass `--image-tag` and the registry and Breeze -will download and execute commands using the same image that was used -during the CI tests. +faster. It is enough to download and uncompress the artifact that stores the +image and run ``breeze ci-image load -i <path-to-image.tar>`` to load the +image and mark the image as refreshed in the local cache. Review Comment: Nice! ########## dev/breeze/doc/ci/01_ci_environment.md: ########## @@ -23,16 +23,18 @@ - [CI Environment](#ci-environment) - [GitHub Actions workflows](#github-actions-workflows) - - [Container Registry used as cache](#container-registry-used-as-cache) + - [GitHub Registry used as cache](#github-registry-used-as-cache) - [Authentication in GitHub Registry](#authentication-in-github-registry) + - [GitHub Artifacts used to store built images](#github-artifacts-used-to-store-built-images) <!-- END doctoc generated TOC please keep comment here to allow auto update --> # CI Environment Continuous Integration is an important component of making Apache Airflow robust and stable. We run a lot of tests for every pull request, -for main and v2-\*-test branches and regularly as scheduled jobs. +for `canary` runs (from `main` and `v*-\*-test` branches and +regularly as scheduled jobs. Review Comment: nit: ``` for `canary` runs (from `main` and `v*-\*-test` branches) regularly as scheduled jobs. ``` ########## dev/breeze/doc/ci/01_ci_environment.md: ########## @@ -60,69 +62,48 @@ To run the tests, we need to ensure that the images are built using the latest sources and that the build process is efficient. A full rebuild of such an image from scratch might take approximately 15 minutes. Therefore, we've implemented optimization techniques that efficiently -use the cache from the GitHub Docker registry. In most cases, this -reduces the time needed to rebuild the image to about 4 minutes. -However, when dependencies change, it can take around 6-7 minutes, and -if the base image of Python releases a new patch-level, it can take -approximately 12 minutes. - -## Container Registry used as cache - -We are using GitHub Container Registry to store the results of the -`Build Images` workflow which is used in the `Tests` workflow. - -Currently in main version of Airflow we run tests in all versions of -Python supported, which means that we have to build multiple images (one -CI and one PROD for each Python version). Yet we run many jobs (\>15) - -for each of the CI images. That is a lot of time to just build the -environment to run. Therefore we are utilising the `pull_request_target` -feature of GitHub Actions. - -This feature allows us to run a separate, independent workflow, when the -main workflow is run -this separate workflow is different than the main -one, because by default it runs using `main` version of the sources but -also - and most of all - that it has WRITE access to the GitHub -Container Image registry. - -This is especially important in our case where Pull Requests to Airflow -might come from any repository, and it would be a huge security issue if -anyone from outside could utilise the WRITE access to the Container -Image Registry via external Pull Request. - -Thanks to the WRITE access and fact that the `pull_request_target` workflow named -`Build Imaages` which - by default - uses the `main` version of the sources. -There we can safely run some code there as it has been reviewed and merged. -The workflow checks-out the incoming Pull Request, builds -the container image from the sources from the incoming PR (which happens in an -isolated Docker build step for security) and pushes such image to the -GitHub Docker Registry - so that this image can be built only once and -used by all the jobs running tests. The image is tagged with unique -`COMMIT_SHA` of the incoming Pull Request and the tests run in the `pull` workflow -can simply pull such image rather than build it from the scratch. -Pulling such image takes ~ 1 minute, thanks to that we are saving a -lot of precious time for jobs. - -We use [GitHub Container Registry](https://docs.github.com/en/packages/guides/about-github-container-registry). -A `GITHUB_TOKEN` is needed to push to the registry. We configured -scopes of the tokens in our jobs to be able to write to the registry, -but only for the jobs that need it. - -The latest cache is kept as `:cache-linux-amd64` and `:cache-linux-arm64` -tagged cache of our CI images (suitable for `--cache-from` directive of -buildx). It contains metadata and cache for all segments in the image, -and cache is kept separately for different platform. +use the cache from Github Actions Artifacts. + +## GitHub Registry used as cache + +We are using GitHub Registry to store the last image built in canary run +to build images in CI and local docker container. +This is done to speed up the build process and to ensure that the +first - time-consuming-to-build layers of the image are +reused between the builds. The cache is stored in the GitHub Registry +by the `canary` runs and then used in the subsequent runs. + +The latest GitHub registry cache is kept as `:cache-linux-amd64` and +`:cache-linux-arm64` tagged cache of our CI images (suitable for +`--cache-from` directive of buildx). It contains +metadata and cache for all segments in the image, +and cache is kept separately for different platforms. The `latest` images of CI and PROD are `amd64` only images for CI, because there is no easy way to push multiplatform images without merging the manifests, and it is not really needed nor used for cache. ## Authentication in GitHub Registry -We are using GitHub Container Registry as cache for our images. -Authentication uses GITHUB_TOKEN mechanism. Authentication is needed for -pushing the images (WRITE) only in `push`, `pull_request_target` -workflows. When you are running the CI jobs in GitHub Actions, -GITHUB_TOKEN is set automatically by the actions. +Authentication to GitHub Registry in CI uses GITHUB_TOKEN mechanism. +The Authentication is needed for pushing the images (WRITE) in the `canary` runs. +When you are running the CI jobs in GitHub Actions, vGITHUB_TOKEN is set automatically Review Comment: nit: `GITHUB_TOKEN` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
