amoghrajesh commented on code in PR #33144: URL: https://github.com/apache/airflow/pull/33144#discussion_r1286568037
########## BREEZE.rst: ########## @@ -2106,13 +2106,30 @@ Those are all available flags of ``generate-constraints`` command: In case someone modifies setup.py, the scheduled CI Tests automatically upgrades and pushes changes to the constraint files, however you can also perform test run of this locally using -the procedure described in `Refreshing CI Cache <dev/REFRESHING_CI_CACHE.md#manually-generating-constraint-files>`_ +the procedure described in the +`Manually generating image cache and constraints <dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md>`_ which utilises multiple processors on your local machine to generate such constraints faster. This bumps the constraint files to latest versions and stores hash of setup.py. The generated constraint and setup.py hash files are stored in the ``files`` folder and while generating the constraints diff of changes vs the previous constraint files is printed. +Updating constraints +"""""""""""""""""""" + +Sometimes (very rarely) we might want to update individual packages in constraints that we generated and +tagged already in the past. This can be done using ``breeze release-management update-constraints`` command. + +These are all available flags of ``update-constraints`` command: + +.. image:: ./images/breeze/output_release-management_update-constraints.svg + :target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_release-management_update-constraints.svg + :width: 100% + :alt: Breeze update-constraints + +You can read more details about what happens when you update constraints in the +`Manually generating image cache and constraints <dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md>`_ Review Comment: Instead of an image, do we want to add these options with a short explanation like we do for other breeze commands? For example `breeze release-management add-back-references` ########## dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md: ########## @@ -0,0 +1,328 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +<!-- START doctoc generated TOC please keep comment here to allow auto update --> +<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Purpose of the document](#purpose-of-the-document) +- [Automated image cache and constraints refreshing in CI](#automated-image-cache-and-constraints-refreshing-in-ci) +- [Manually refreshing the image cache](#manually-refreshing-the-image-cache) + - [Why we need to update image cache manually](#why-we-need-to-update-image-cache-manually) + - [Prerequisites](#prerequisites) + - [How to refresh the image cache](#how-to-refresh-the-image-cache) + - [Is it safe to refresh the image cache?](#is-it-safe-to-refresh-the-image-cache) + - [What the command does](#what-the-command-does) +- [Manually generating constraint files](#manually-generating-constraint-files) + - [Why we need to generate constraint files manually](#why-we-need-to-generate-constraint-files-manually) + - [How to generate constraint files](#how-to-generate-constraint-files) + - [Is it safe to generate constraints manually?](#is-it-safe-to-generate-constraints-manually) +- [Manually updating already tagged constraint files](#manually-updating-already-tagged-constraint-files) + - [Why we need to update constraint files manually (very rarely)](#why-we-need-to-update-constraint-files-manually-very-rarely) + - [How to update the constraints](#how-to-update-the-constraints) + - [Is it safe to update constraints manually?](#is-it-safe-to-update-constraints-manually) + - [How the command works under-the-hood ?](#how-the-command-works-under-the-hood-) + - [Examples of running the command](#examples-of-running-the-command) + +<!-- END doctoc generated TOC please keep comment here to allow auto update --> + +# Purpose of the document + +This documents contains explanation of a few manual procedures we might use at certain times, to update +our CI and constraints manually when the automation of our CI is not enough. There are some edge cases +and events that might trigger the need of refreshing the information stored in our GitHub Repository. + +We are storing two things we are storing in our GitHub Registry that are needed for both - our contributors +and users: + +* `CI and PROD image cache` - used by our CI jobs to speed up building of images while CI jobs are running +* `Constraints files` - used by both, CI jobs (to fix the versions of dependencies used by CI jobs in regular + PRs) and used by our users to reproducibly install released airflow versions. + +Normally, both are updated and refreshed automatically vi [CI system](../CI.rst). However, there are some +cases where we need to update them manually. This document describes how to do it. + +# Automated image cache and constraints refreshing in CI + +Our [CI system](../CI.rst) is build in the way that it self-maintains. Regular scheduled builds and +merges to `main` branch have separate maintenance step that take care about refreshing the cache that is +used to speed up our builds and to speed up rebuilding of [Breeze](../BREEZE.rst) images for development +purpose. This is all happening automatically, usually: + +* The latest [constraints](../CONTRIBUTING.rst#pinned-constraint-files) are pushed to appropriate branch + after all tests succeeded in `main` merge or in `scheduled` build Review Comment: super nit: `succeed` ########## dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md: ########## @@ -0,0 +1,328 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +<!-- START doctoc generated TOC please keep comment here to allow auto update --> +<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Purpose of the document](#purpose-of-the-document) +- [Automated image cache and constraints refreshing in CI](#automated-image-cache-and-constraints-refreshing-in-ci) +- [Manually refreshing the image cache](#manually-refreshing-the-image-cache) + - [Why we need to update image cache manually](#why-we-need-to-update-image-cache-manually) + - [Prerequisites](#prerequisites) + - [How to refresh the image cache](#how-to-refresh-the-image-cache) + - [Is it safe to refresh the image cache?](#is-it-safe-to-refresh-the-image-cache) + - [What the command does](#what-the-command-does) +- [Manually generating constraint files](#manually-generating-constraint-files) + - [Why we need to generate constraint files manually](#why-we-need-to-generate-constraint-files-manually) + - [How to generate constraint files](#how-to-generate-constraint-files) + - [Is it safe to generate constraints manually?](#is-it-safe-to-generate-constraints-manually) +- [Manually updating already tagged constraint files](#manually-updating-already-tagged-constraint-files) + - [Why we need to update constraint files manually (very rarely)](#why-we-need-to-update-constraint-files-manually-very-rarely) + - [How to update the constraints](#how-to-update-the-constraints) + - [Is it safe to update constraints manually?](#is-it-safe-to-update-constraints-manually) + - [How the command works under-the-hood ?](#how-the-command-works-under-the-hood-) + - [Examples of running the command](#examples-of-running-the-command) + +<!-- END doctoc generated TOC please keep comment here to allow auto update --> + +# Purpose of the document + +This documents contains explanation of a few manual procedures we might use at certain times, to update +our CI and constraints manually when the automation of our CI is not enough. There are some edge cases +and events that might trigger the need of refreshing the information stored in our GitHub Repository. + +We are storing two things we are storing in our GitHub Registry that are needed for both - our contributors +and users: + +* `CI and PROD image cache` - used by our CI jobs to speed up building of images while CI jobs are running +* `Constraints files` - used by both, CI jobs (to fix the versions of dependencies used by CI jobs in regular + PRs) and used by our users to reproducibly install released airflow versions. + +Normally, both are updated and refreshed automatically vi [CI system](../CI.rst). However, there are some +cases where we need to update them manually. This document describes how to do it. + +# Automated image cache and constraints refreshing in CI + +Our [CI system](../CI.rst) is build in the way that it self-maintains. Regular scheduled builds and +merges to `main` branch have separate maintenance step that take care about refreshing the cache that is +used to speed up our builds and to speed up rebuilding of [Breeze](../BREEZE.rst) images for development +purpose. This is all happening automatically, usually: + +* The latest [constraints](../CONTRIBUTING.rst#pinned-constraint-files) are pushed to appropriate branch + after all tests succeeded in `main` merge or in `scheduled` build + +* The [images](../IMAGES.rst) in `ghcr.io` registry are refreshed after every successful merge to `main` + or `scheduled` build and after pushing the constraints, this means that the latest image cache uses + also the latest tested constraints + + +# Manually refreshing the image cache + +## Why we need to update image cache manually + +Sometimes, when we have a problem with our CI running and flakiness of GitHub Actions runners or our +tests, the refresh might not be triggered. This has been mitigated by "Push Early Image Cache" job added in +our CI, but there are other reasons you might want to refresh the cache. Sometimes we want to refresh the +image cache in vX_Y_test branch before we attempt to push a change there. There are no PRs happening in Review Comment: `vX_Y_test` -> does this indicate a random test branch? ########## dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md: ########## @@ -0,0 +1,328 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +<!-- START doctoc generated TOC please keep comment here to allow auto update --> +<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Purpose of the document](#purpose-of-the-document) +- [Automated image cache and constraints refreshing in CI](#automated-image-cache-and-constraints-refreshing-in-ci) +- [Manually refreshing the image cache](#manually-refreshing-the-image-cache) + - [Why we need to update image cache manually](#why-we-need-to-update-image-cache-manually) + - [Prerequisites](#prerequisites) + - [How to refresh the image cache](#how-to-refresh-the-image-cache) + - [Is it safe to refresh the image cache?](#is-it-safe-to-refresh-the-image-cache) + - [What the command does](#what-the-command-does) +- [Manually generating constraint files](#manually-generating-constraint-files) + - [Why we need to generate constraint files manually](#why-we-need-to-generate-constraint-files-manually) + - [How to generate constraint files](#how-to-generate-constraint-files) + - [Is it safe to generate constraints manually?](#is-it-safe-to-generate-constraints-manually) +- [Manually updating already tagged constraint files](#manually-updating-already-tagged-constraint-files) + - [Why we need to update constraint files manually (very rarely)](#why-we-need-to-update-constraint-files-manually-very-rarely) + - [How to update the constraints](#how-to-update-the-constraints) + - [Is it safe to update constraints manually?](#is-it-safe-to-update-constraints-manually) + - [How the command works under-the-hood ?](#how-the-command-works-under-the-hood-) + - [Examples of running the command](#examples-of-running-the-command) + +<!-- END doctoc generated TOC please keep comment here to allow auto update --> + +# Purpose of the document + +This documents contains explanation of a few manual procedures we might use at certain times, to update +our CI and constraints manually when the automation of our CI is not enough. There are some edge cases +and events that might trigger the need of refreshing the information stored in our GitHub Repository. + +We are storing two things we are storing in our GitHub Registry that are needed for both - our contributors +and users: + +* `CI and PROD image cache` - used by our CI jobs to speed up building of images while CI jobs are running +* `Constraints files` - used by both, CI jobs (to fix the versions of dependencies used by CI jobs in regular + PRs) and used by our users to reproducibly install released airflow versions. + +Normally, both are updated and refreshed automatically vi [CI system](../CI.rst). However, there are some +cases where we need to update them manually. This document describes how to do it. + +# Automated image cache and constraints refreshing in CI + +Our [CI system](../CI.rst) is build in the way that it self-maintains. Regular scheduled builds and +merges to `main` branch have separate maintenance step that take care about refreshing the cache that is +used to speed up our builds and to speed up rebuilding of [Breeze](../BREEZE.rst) images for development +purpose. This is all happening automatically, usually: + +* The latest [constraints](../CONTRIBUTING.rst#pinned-constraint-files) are pushed to appropriate branch + after all tests succeeded in `main` merge or in `scheduled` build Review Comment: Actually this sentence is a little hard to understand: `after all tests succeeded in `main` merge or in `scheduled` build` Need to re word it ########## dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md: ########## @@ -0,0 +1,328 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +<!-- START doctoc generated TOC please keep comment here to allow auto update --> +<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Purpose of the document](#purpose-of-the-document) +- [Automated image cache and constraints refreshing in CI](#automated-image-cache-and-constraints-refreshing-in-ci) +- [Manually refreshing the image cache](#manually-refreshing-the-image-cache) + - [Why we need to update image cache manually](#why-we-need-to-update-image-cache-manually) + - [Prerequisites](#prerequisites) + - [How to refresh the image cache](#how-to-refresh-the-image-cache) + - [Is it safe to refresh the image cache?](#is-it-safe-to-refresh-the-image-cache) + - [What the command does](#what-the-command-does) +- [Manually generating constraint files](#manually-generating-constraint-files) + - [Why we need to generate constraint files manually](#why-we-need-to-generate-constraint-files-manually) + - [How to generate constraint files](#how-to-generate-constraint-files) + - [Is it safe to generate constraints manually?](#is-it-safe-to-generate-constraints-manually) +- [Manually updating already tagged constraint files](#manually-updating-already-tagged-constraint-files) + - [Why we need to update constraint files manually (very rarely)](#why-we-need-to-update-constraint-files-manually-very-rarely) + - [How to update the constraints](#how-to-update-the-constraints) + - [Is it safe to update constraints manually?](#is-it-safe-to-update-constraints-manually) + - [How the command works under-the-hood ?](#how-the-command-works-under-the-hood-) + - [Examples of running the command](#examples-of-running-the-command) + +<!-- END doctoc generated TOC please keep comment here to allow auto update --> + +# Purpose of the document + +This documents contains explanation of a few manual procedures we might use at certain times, to update +our CI and constraints manually when the automation of our CI is not enough. There are some edge cases +and events that might trigger the need of refreshing the information stored in our GitHub Repository. + +We are storing two things we are storing in our GitHub Registry that are needed for both - our contributors +and users: + +* `CI and PROD image cache` - used by our CI jobs to speed up building of images while CI jobs are running +* `Constraints files` - used by both, CI jobs (to fix the versions of dependencies used by CI jobs in regular + PRs) and used by our users to reproducibly install released airflow versions. + +Normally, both are updated and refreshed automatically vi [CI system](../CI.rst). However, there are some Review Comment: super nit: `via` instead of `vi` ########## dev/breeze/src/airflow_breeze/commands/release_management_commands.py: ########## @@ -1289,3 +1289,160 @@ def generate_providers_metadata(refresh_constraints: bool, python: str | None): import json PROVIDER_METADATA_JSON_FILE_PATH.write_text(json.dumps(metadata_dict, indent=4, sort_keys=True)) + + +def fetch_remote(constraints_repo: Path, remote_name: str) -> None: + run_command(["git", "fetch", remote_name], cwd=constraints_repo) + + +def checkout_constraint_tag_and_reset_branch(constraints_repo: Path, airflow_version: str) -> None: + run_command( + ["git", "reset", "--hard"], + cwd=constraints_repo, + ) + # Switch to tag + run_command( + ["git", "checkout", f"constraints-{airflow_version}"], + cwd=constraints_repo, + ) + # Create or reset branch to point + run_command( + ["git", "checkout", "-B", f"constraints-{airflow_version}-fix"], + cwd=constraints_repo, + ) + get_console().print( + f"[info]Checked out constraints tag: constraints-{airflow_version} and " + f"reset branch constraints-{airflow_version}-fix to it.[/]" + ) + result = run_command( + ["git", "show", "-s", "--format=%H"], + cwd=constraints_repo, + text=True, + capture_output=True, + ) + get_console().print(f"[info]The hash commit of the tag:[/] {result.stdout}") + + +def modify_single_file_constraints(constraints_file: Path, updated_constraints: tuple[str]) -> bool: + constraint_content = constraints_file.read_text() + original_content = constraint_content + for constraint in updated_constraints: + package, version = constraint.split("==") + constraint_content = re.sub( + rf"^{package}==.*$", f"{package}=={version}", constraint_content, flags=re.MULTILINE + ) + if constraint_content != original_content: + if not get_dry_run(): + constraints_file.write_text(constraint_content) + get_console().print("[success]Updated.[/]") + return True + else: + get_console().print("[warning]The file has not been modified.[/]") + return False + + +def modify_all_constraint_files(constraints_repo: Path, updated_constraint: tuple[str]) -> bool: + get_console().print("[info]Updating constraints files:[/]") + modified = False + for constraints_file in constraints_repo.glob("constraints-*.txt"): + get_console().print(f"[info]Updating {constraints_file.name}") + if modify_single_file_constraints(constraints_file, updated_constraint): + modified = True + return modified + + +def confirm_modifications(constraints_repo: Path) -> bool: + run_command(["git", "diff"], cwd=constraints_repo, env={"PAGER": ""}) + confirm = user_confirm("Do you want to continue?") + if confirm == Answer.YES: + return True + elif confirm == Answer.NO: + return False + else: + sys.exit(1) + + +def commit_constraints_and_tag(constraints_repo: Path, airflow_version: str, commit_message: str) -> None: + run_command( + ["git", "commit", "-a", "--no-verify", "-m", commit_message], + cwd=constraints_repo, + ) + run_command( + ["git", "tag", f"constraints-{airflow_version}", "--force", "-s", "-m", commit_message, "HEAD"], + cwd=constraints_repo, + ) + + +def push_constraints_and_tag(constraints_repo: Path, remote_name: str, airflow_version: str) -> None: + run_command( + ["git", "push", remote_name, f"constraints-{airflow_version}-fix"], + cwd=constraints_repo, + ) + run_command( + ["git", "push", remote_name, f"constraints-{airflow_version}", "--force"], + cwd=constraints_repo, + ) + + +@release_management.command( + name="update-constraints", help="Update released constraints with manual changes." +) [email protected]( + "--constraints-repo", + type=click.Path(file_okay=False, dir_okay=True, path_type=Path, exists=True), + required=True, + envvar="CONSTRAINTS_REPO", + help="Path where airflow repository is checked out, with ``constraints-main`` branch checked out.", +) [email protected]( + "--remote-name", + type=str, + default="apache", + envvar="REMOTE_NAME", + help="Name of the remote to push the changes to.", +) [email protected]( + "--airflow-versions", + type=str, + required=True, + envvar="AIRFLOW_VERSIONS", + help="Comma separated list of Airflow versions to update constraints for.", +) [email protected]( + "--commit-message", + type=str, + required=True, + envvar="COMMIT_MESSAGE", + help="Commit message to use for the constraints update.", +) [email protected]( + "--updated-constraint", + required=True, + envvar="UPDATED_CONSTRAINT", + multiple=True, + help="Constraints to be set - in the form of `package==version`. Can be repeated", +) +@option_verbose +@option_dry_run +@option_answer +def update_constraints( + constraints_repo: Path, + remote_name: str, + airflow_versions: str, + commit_message: str, + updated_constraint: tuple[str], +) -> None: + airflow_versions_array = airflow_versions.split(",") Review Comment: We might want to assert for 0 size array here. If for example, someone just provides `--airflow-versions` and forgets the value, it would be nice to early exit ########## dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md: ########## @@ -0,0 +1,328 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +<!-- START doctoc generated TOC please keep comment here to allow auto update --> +<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Purpose of the document](#purpose-of-the-document) +- [Automated image cache and constraints refreshing in CI](#automated-image-cache-and-constraints-refreshing-in-ci) +- [Manually refreshing the image cache](#manually-refreshing-the-image-cache) + - [Why we need to update image cache manually](#why-we-need-to-update-image-cache-manually) + - [Prerequisites](#prerequisites) + - [How to refresh the image cache](#how-to-refresh-the-image-cache) + - [Is it safe to refresh the image cache?](#is-it-safe-to-refresh-the-image-cache) + - [What the command does](#what-the-command-does) +- [Manually generating constraint files](#manually-generating-constraint-files) + - [Why we need to generate constraint files manually](#why-we-need-to-generate-constraint-files-manually) + - [How to generate constraint files](#how-to-generate-constraint-files) + - [Is it safe to generate constraints manually?](#is-it-safe-to-generate-constraints-manually) +- [Manually updating already tagged constraint files](#manually-updating-already-tagged-constraint-files) + - [Why we need to update constraint files manually (very rarely)](#why-we-need-to-update-constraint-files-manually-very-rarely) + - [How to update the constraints](#how-to-update-the-constraints) + - [Is it safe to update constraints manually?](#is-it-safe-to-update-constraints-manually) + - [How the command works under-the-hood ?](#how-the-command-works-under-the-hood-) + - [Examples of running the command](#examples-of-running-the-command) + +<!-- END doctoc generated TOC please keep comment here to allow auto update --> + +# Purpose of the document + +This documents contains explanation of a few manual procedures we might use at certain times, to update +our CI and constraints manually when the automation of our CI is not enough. There are some edge cases +and events that might trigger the need of refreshing the information stored in our GitHub Repository. + +We are storing two things we are storing in our GitHub Registry that are needed for both - our contributors +and users: + +* `CI and PROD image cache` - used by our CI jobs to speed up building of images while CI jobs are running +* `Constraints files` - used by both, CI jobs (to fix the versions of dependencies used by CI jobs in regular + PRs) and used by our users to reproducibly install released airflow versions. + +Normally, both are updated and refreshed automatically vi [CI system](../CI.rst). However, there are some +cases where we need to update them manually. This document describes how to do it. + +# Automated image cache and constraints refreshing in CI + +Our [CI system](../CI.rst) is build in the way that it self-maintains. Regular scheduled builds and +merges to `main` branch have separate maintenance step that take care about refreshing the cache that is +used to speed up our builds and to speed up rebuilding of [Breeze](../BREEZE.rst) images for development +purpose. This is all happening automatically, usually: + +* The latest [constraints](../CONTRIBUTING.rst#pinned-constraint-files) are pushed to appropriate branch + after all tests succeeded in `main` merge or in `scheduled` build + +* The [images](../IMAGES.rst) in `ghcr.io` registry are refreshed after every successful merge to `main` + or `scheduled` build and after pushing the constraints, this means that the latest image cache uses + also the latest tested constraints + + +# Manually refreshing the image cache + +## Why we need to update image cache manually + +Sometimes, when we have a problem with our CI running and flakiness of GitHub Actions runners or our +tests, the refresh might not be triggered. This has been mitigated by "Push Early Image Cache" job added in +our CI, but there are other reasons you might want to refresh the cache. Sometimes we want to refresh the +image cache in vX_Y_test branch before we attempt to push a change there. There are no PRs happening in +this branch, so manual refresh before we make a PR might speed up the PR build. +Or sometimes we just refreshed the constraints (see below) and we want the cache to include those. + +## Prerequisites + +Note that in order to refresh images you have to not only have `buildx` command installed for docker, +but you should also make sure that you have the buildkit builder configured and set. Since we also build +multi-platform images (for both AMD and ARM), you need to have support for qemu or hardware ARM/AMD builders +configured. The chapters below explain both options. + +### Setting up cache refreshing with emulation + +According to the [official installation instructions](https://docs.docker.com/buildx/working-with-buildx/#build-multi-platform-images) +this can be achieved via: + +```shell +docker run --privileged --rm tonistiigi/binfmt --install all +``` + +More information can be found [here](https://docs.docker.com/engine/reference/commandline/buildx_create/) + +However, emulation is very slow - more than 10x slower than hardware-backed builds. + +### Setting up cache refreshing with hardware ARM/AMD support + +If you plan to build a number of images, probably better solution is to set up a hardware remote builder +for your ARM or AMD builds (depending which platform you build images on - the "other" platform should be +remote. + +This can be achieved by settings build as described in +[this guideline](https://www.docker.com/blog/speed-up-building-with-docker-buildx-and-graviton2-ec2/) and +adding it to docker buildx `airflow_cache` builder. + +This usually can be done with those two commands: + +```bash +docker buildx create --name airflow_cache # your local builder +docker buildx create --name airflow_cache --append HOST:PORT # your remote builder +``` + +One of the ways to have HOST:PORT is to login to the remote machine via SSH and forward the port to +the docker engine running on the remote machine. + +When everything is fine you should see both local and remote builder configured and reporting status: + +```bash +docker buildx ls + + airflow_cache docker-container + airflow_cache0 unix:///var/run/docker.sock + airflow_cache1 tcp://127.0.0.1:2375 +``` + +## How to refresh the image cache + +The images can be rebuilt and refreshed after the constraints are pushed. Refreshing image for all +python version is a simple as running the [refresh_images.sh](refresh_images.sh) script which will +rebuild all the images in parallel and push them to the registry. + +Note that you need to run `docker login ghcr.io` before you run the script and you need to be +a committer in order to be able to push the cache to the registry. + +```bash + +```bash +./dev/refresh_images.sh +``` Review Comment: Double `bash` intended? ########## dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md: ########## @@ -0,0 +1,328 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +<!-- START doctoc generated TOC please keep comment here to allow auto update --> +<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Purpose of the document](#purpose-of-the-document) +- [Automated image cache and constraints refreshing in CI](#automated-image-cache-and-constraints-refreshing-in-ci) +- [Manually refreshing the image cache](#manually-refreshing-the-image-cache) + - [Why we need to update image cache manually](#why-we-need-to-update-image-cache-manually) + - [Prerequisites](#prerequisites) + - [How to refresh the image cache](#how-to-refresh-the-image-cache) + - [Is it safe to refresh the image cache?](#is-it-safe-to-refresh-the-image-cache) + - [What the command does](#what-the-command-does) +- [Manually generating constraint files](#manually-generating-constraint-files) + - [Why we need to generate constraint files manually](#why-we-need-to-generate-constraint-files-manually) + - [How to generate constraint files](#how-to-generate-constraint-files) + - [Is it safe to generate constraints manually?](#is-it-safe-to-generate-constraints-manually) +- [Manually updating already tagged constraint files](#manually-updating-already-tagged-constraint-files) + - [Why we need to update constraint files manually (very rarely)](#why-we-need-to-update-constraint-files-manually-very-rarely) + - [How to update the constraints](#how-to-update-the-constraints) + - [Is it safe to update constraints manually?](#is-it-safe-to-update-constraints-manually) + - [How the command works under-the-hood ?](#how-the-command-works-under-the-hood-) + - [Examples of running the command](#examples-of-running-the-command) + +<!-- END doctoc generated TOC please keep comment here to allow auto update --> + +# Purpose of the document + +This documents contains explanation of a few manual procedures we might use at certain times, to update Review Comment: super nit: `document` ########## dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md: ########## @@ -0,0 +1,328 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +<!-- START doctoc generated TOC please keep comment here to allow auto update --> +<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Purpose of the document](#purpose-of-the-document) +- [Automated image cache and constraints refreshing in CI](#automated-image-cache-and-constraints-refreshing-in-ci) +- [Manually refreshing the image cache](#manually-refreshing-the-image-cache) + - [Why we need to update image cache manually](#why-we-need-to-update-image-cache-manually) + - [Prerequisites](#prerequisites) + - [How to refresh the image cache](#how-to-refresh-the-image-cache) + - [Is it safe to refresh the image cache?](#is-it-safe-to-refresh-the-image-cache) + - [What the command does](#what-the-command-does) +- [Manually generating constraint files](#manually-generating-constraint-files) + - [Why we need to generate constraint files manually](#why-we-need-to-generate-constraint-files-manually) + - [How to generate constraint files](#how-to-generate-constraint-files) + - [Is it safe to generate constraints manually?](#is-it-safe-to-generate-constraints-manually) +- [Manually updating already tagged constraint files](#manually-updating-already-tagged-constraint-files) + - [Why we need to update constraint files manually (very rarely)](#why-we-need-to-update-constraint-files-manually-very-rarely) + - [How to update the constraints](#how-to-update-the-constraints) + - [Is it safe to update constraints manually?](#is-it-safe-to-update-constraints-manually) + - [How the command works under-the-hood ?](#how-the-command-works-under-the-hood-) + - [Examples of running the command](#examples-of-running-the-command) + +<!-- END doctoc generated TOC please keep comment here to allow auto update --> + +# Purpose of the document + +This documents contains explanation of a few manual procedures we might use at certain times, to update +our CI and constraints manually when the automation of our CI is not enough. There are some edge cases +and events that might trigger the need of refreshing the information stored in our GitHub Repository. + +We are storing two things we are storing in our GitHub Registry that are needed for both - our contributors +and users: + +* `CI and PROD image cache` - used by our CI jobs to speed up building of images while CI jobs are running +* `Constraints files` - used by both, CI jobs (to fix the versions of dependencies used by CI jobs in regular + PRs) and used by our users to reproducibly install released airflow versions. + +Normally, both are updated and refreshed automatically vi [CI system](../CI.rst). However, there are some +cases where we need to update them manually. This document describes how to do it. + +# Automated image cache and constraints refreshing in CI + +Our [CI system](../CI.rst) is build in the way that it self-maintains. Regular scheduled builds and +merges to `main` branch have separate maintenance step that take care about refreshing the cache that is +used to speed up our builds and to speed up rebuilding of [Breeze](../BREEZE.rst) images for development +purpose. This is all happening automatically, usually: + +* The latest [constraints](../CONTRIBUTING.rst#pinned-constraint-files) are pushed to appropriate branch + after all tests succeeded in `main` merge or in `scheduled` build + +* The [images](../IMAGES.rst) in `ghcr.io` registry are refreshed after every successful merge to `main` + or `scheduled` build and after pushing the constraints, this means that the latest image cache uses + also the latest tested constraints + + +# Manually refreshing the image cache + +## Why we need to update image cache manually + +Sometimes, when we have a problem with our CI running and flakiness of GitHub Actions runners or our +tests, the refresh might not be triggered. This has been mitigated by "Push Early Image Cache" job added in +our CI, but there are other reasons you might want to refresh the cache. Sometimes we want to refresh the +image cache in vX_Y_test branch before we attempt to push a change there. There are no PRs happening in +this branch, so manual refresh before we make a PR might speed up the PR build. +Or sometimes we just refreshed the constraints (see below) and we want the cache to include those. + +## Prerequisites + +Note that in order to refresh images you have to not only have `buildx` command installed for docker, +but you should also make sure that you have the buildkit builder configured and set. Since we also build +multi-platform images (for both AMD and ARM), you need to have support for qemu or hardware ARM/AMD builders +configured. The chapters below explain both options. + +### Setting up cache refreshing with emulation + +According to the [official installation instructions](https://docs.docker.com/buildx/working-with-buildx/#build-multi-platform-images) +this can be achieved via: + +```shell +docker run --privileged --rm tonistiigi/binfmt --install all +``` + +More information can be found [here](https://docs.docker.com/engine/reference/commandline/buildx_create/) + +However, emulation is very slow - more than 10x slower than hardware-backed builds. + +### Setting up cache refreshing with hardware ARM/AMD support + +If you plan to build a number of images, probably better solution is to set up a hardware remote builder +for your ARM or AMD builds (depending which platform you build images on - the "other" platform should be +remote. + +This can be achieved by settings build as described in +[this guideline](https://www.docker.com/blog/speed-up-building-with-docker-buildx-and-graviton2-ec2/) and +adding it to docker buildx `airflow_cache` builder. + +This usually can be done with those two commands: + +```bash +docker buildx create --name airflow_cache # your local builder +docker buildx create --name airflow_cache --append HOST:PORT # your remote builder +``` + +One of the ways to have HOST:PORT is to login to the remote machine via SSH and forward the port to +the docker engine running on the remote machine. + +When everything is fine you should see both local and remote builder configured and reporting status: + +```bash +docker buildx ls + + airflow_cache docker-container + airflow_cache0 unix:///var/run/docker.sock + airflow_cache1 tcp://127.0.0.1:2375 +``` + +## How to refresh the image cache + +The images can be rebuilt and refreshed after the constraints are pushed. Refreshing image for all +python version is a simple as running the [refresh_images.sh](refresh_images.sh) script which will +rebuild all the images in parallel and push them to the registry. + +Note that you need to run `docker login ghcr.io` before you run the script and you need to be +a committer in order to be able to push the cache to the registry. + +```bash + +```bash +./dev/refresh_images.sh +``` + +## Is it safe to refresh the image cache? + +Yes. Image cache is only used to speed up the build process in CI. The worst thing that can happen if +the image cache is broken is that the PR builds of our will run slower - usually, for regular PRs building +the images from scratch takes about 15 minutes. With the image cache it takes about 1 minute if there are no +dependency changes. So if the image cache is broken, the worst thing that will happen is that the PR builds +will run longer "Wait for CI Image" step and "Wait for PROD image" will simply wait a bit longer. + +Eventually the cache will heal itself. When the `main` build succeeds with all the tests, the cache is +automatically updated. Actually it's even faster in new CI process of ours, the cache is refreshed +very quickly after there is a merge of a new PR to the main ("Push Early Image Cache" jobs), so +cache refreshing and self-healing should be generally rather quick. + +## What the command does + +The command does the following: + +* builds the CI image using the builders configured using buildx and pushes the cache + to the `apache/airflow` registry (`--prepare-buildx-cache` flag). It builds all images in parallel for + both AMD and ARM architectures. +* prepares packages and airflow packages in `dist` folder using the latest sources +* moves the packages to the `docker-context-files` folder so that they are available when building the + PROD images +* builds the PROD image using the builders configured and packages prepared using buildx and pushes the cache + to the `apache/airflow` registry (`--prepare-buildx-cache` flag). It builds all images in parallel for + both AMD and ARM architectures. + +# Manually generating constraint files + +## Why we need to generate constraint files manually + +Sometimes we want to generate constraint files if - for whatever reason - we cannot or do not want to wait +until `main` or `vY_Z_test` branch tests succeed. The constraints are only refreshed by CI when all the tests +pass, and this is a good thing, however there are some cases where we cannot solve some intermittent problem +with tests, but we KNOW that the tip of the branch is good and we want to release a new airflow version or +we want to move the PRs of contributors to start using the new constraints. This should be done with caution +and you need to be sure what you are doing, but you can always do it manually if you want. + +## How to generate constraint files + +```bash +breeze ci-image build --run-in-parallel --upgrade-to-newer-dependencies --answer yes +breeze release-management generate-constraints --airflow-constraints-mode constraints --run-in-parallel --answer yes +breeze release-management generate-constraints --airflow-constraints-mode constraints-source-providers --run-in-parallel --answer yes +breeze release-management generate-constraints --airflow-constraints-mode constraints-no-providers --run-in-parallel --answer yes + +AIRFLOW_SOURCES=$(pwd) Review Comment: Looks good! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
