Thanks Jarek! Awesome to see this change, amazing work :) On Wed, 1 Jan 2025, 12:35 Jarek Potiuk, <ja...@potiuk.com> wrote:
> > *Update on the workflow refactor for our CI* > > *Tl;DR; We should have some good speed-ups, stability and simplifications > in our CI. Read on if you are interested in details* > > Some first passes on the new workflows have been done and small "teething" > problems addressed - so far so good. > > Once again kudos to Jacob (CC:) who developed and published the action in > https://github.com/apache/infrastructure-actions/tree/main/stash - it's > been extremely useful as better replacement of the github built-in "cache" > action - with no size limitations, automated retention of cache and better > support of cross-fork cache. > > We also worked - mostly with Pavan (kudos to all the PRs, help, reviews, > and watchful eyes there!) on some follow-ups and there are some good things > if you are interested in what's been improved. > > *Caching improvements for venvs and pre-commits and adding more `uv` * > > We doubled-down on "stash" action and have now quite comprehensive caching > implemented in various stages of our builds. We used the "quiet" period > around Xmas/New Year and tried to optimize various areas where build time > could be improved. > > There are some interesting findings - It turned out that `uv` is SO FAST > that in several cases using cache is SLOWER (installing breeze, creating > k8s test environments). Simply the overhead to run the action, download and > extract the cache from the storage is bigger than time for uv to resolve, > pull and install the packages. So we do not use caching there, and > appropriate comments are added. > We reviewed all the places where `pip` was still used and applied "use-uv" > in all the places it has not been used, and added caching where it made > sense. > > This is mostly in https://github.com/apache/airflow/pull/45289 > Some numbers: > > * installing pre-commits now in most cases (when pre-commits are not > modified) is down from 1m 35 -> 30s > * installing k8s env (which is multiplied by a number of k8s tests we run > in some PRS) is down from 1m -> 10s - this is particularly visible in > "canary runs" when we run now 32x variants of those tests :) > * uv applied in a few missing places decreased installation of some venvs > to merely about 1 minute to few seconds. > > *Cache mount CI improvements* > > Wa also managed to come up with some "eat-cake and have it too" - where we > had conflicting CI and local dev needs. We are now using "--mount-cache" > to keep uv cache and speed up local Breeze CI builds, but that caused CI > builds to be generally slower 5 minutes 45s to build the image - because > the uv had to reinstall airflow + 700+ dependencies from scratch. > > But with https://github.com/apache/airflow/pull/45314 we also used > "stash" action to cache the `uv` cache in canary builds (and we can re-use > the cache in fork PRs). This way we are down to a bit more than 3 minutes > to build the image. While this is not as "fast" as the previous approach > using GitHub Registry and pre-caching installation from "main", the fact > that we have a single workflow and no "pull_request_target" and no "wait > for images", and generally simplifying the approach makes it up altogether. > > So we got: 5m 45s -> 3m 10s .. > > The "local" experience with building "breeze" images should be much better > overall - after the first local build of an image, subsequent rebuilds even > with a lot of changes in dependencies should be way faster - generally > rebuilding the breeze image once it has been built locally oonce should > take seconds rather than minutes (this might be longer if we have new > python patchlevel released - but it happens once every few weeks). > Here is the build broken down to: downloading the uv cache, importing it, > and building the image using it. > > [image: Screenshot 2025-01-01 at 10.07.04.png] > > If we had just the "build image" step without caching - it would be ~ 5m > 45 seconds - so we save almost 50% of CI image build time per python > version per run. That's also about 3 minutes shorter feedback time - less > waiting. > > *Reproducing CI failures locally* > > Thanks to Pavan's https://github.com/apache/airflow/pull/45287 (further > improved by https://github.com/apache/airflow/pull/45296 and > https://github.com/apache/airflow/pull/45324 ) we have an even easier way > now to reproduce the CI builds (if you have an AMD machine for now - that > will change in the future when we have ARC enabled and ARM builds in CI). > > Whenever you want to locally reproduce a failure in CI you should be able > to specify PR# that you want to "reproduce failure of"; > > breeze ci-image load --from-pr 12345 --python 3.9 --github-token TOKEN > > Similarly when you want to reproduce a specific run failure: > > breeze ci-image load --from-run 12538475388 --python 3.9 --github-token > TOKEN > > After that, your local image will be exactly the same as the one in CI - > with this command, you can be dropped into breeze shell and do some testing > there, without switching to the branch of the PR > > breeze shell --mount-sources skip [OTHER OPTIONS] > > If you check-out the branch of the PR that was used, regular ``breeze`` > commands will also reproduce the CI environment without having to rebuild > the image - for example when dependencies changed or when new dependencies > were released and used in the CI job - and you will be able to edit source > files locally as usual. > This is nicely described in our docs in a few relevant places in our docs > - mainly here: > https://github.com/apache/airflow/blob/main/dev/breeze/doc/ci/07_running_ci_locally.md > - I updated the docs and refreshed it, and it should be easier to > understand and follow it. > > Jarek and Pavan > > > > On Mon, Dec 30, 2024 at 7:17 AM Amogh Desai <amoghdesai....@gmail.com> > wrote: > >> Thanks Jarek for simplifying the workflows and thanks for the announcement >> too, or contributors would probably be pretty lost >> if something strange happened. >> >> Thanks & Regards, >> Amogh Desai >> >> >> On Mon, Dec 30, 2024 at 11:26 AM Vishnu Chilukoori < >> vish.chiluko...@gmail.com> wrote: >> >> > Thanks Jarek, great work on simplifying and securing CI workflows! >> > >> > >> > -- >> > Regards, >> > Vishnu Chilukoori >> > >> > On Sun, Dec 29, 2024 at 2:57 PM Pavankumar Gopidesu < >> > gopidesupa...@gmail.com> >> > wrote: >> > >> > > Woohooo Thanks Jarek Great work :) >> > > >> > > Regards, >> > > Pavan >> > > >> > > On Sun, Dec 29, 2024 at 10:15 PM Jarek Potiuk <ja...@potiuk.com> >> wrote: >> > > > >> > > > Hello here, >> > > > >> > > > TL;DR; I just merged https://github.com/apache/airflow/pull/45266 - >> > > > which implemented a much simplified and nicer workflow for our CI. >> > > > >> > > > Rebase to the latest `main` and you should be good to go. >> > > > >> > > > It (finally) switches o from a workflow we had for years (using >> pretty >> > > > dangerous from the security point of view `pull_request_target` >> > > workflow) - >> > > > into using Artifacts for sharing images in workflow. This was >> possible >> > > > thanks to new "artifacts" actions and switching to UV. >> > > > >> > > > The benefit of it is that it is way safer - no more "dangerous >> > workflows" >> > > > and simpler - we have a lot simpler Dockerfile.ci and caching >> mechanism >> > > > implemented. We worked this out by discussing with other ASF >> projects >> > and >> > > > actually even reusing an action developed by a fellow Apache Arrow >> > > > committer and PMC member - Jacob Wujciak. >> > > > >> > > > The things everyone should do: >> > > > >> > > > * rebase your PR to latest main to make your PRs rebuilt using the >> new >> > > > workflow >> > > > * run `breeze ci-image build` if you are using breeze locally >> > > > >> > > > I expect some teething problems, so do not hesitate to raise your >> > > problems >> > > > in #internal-airflow-ci-cd channel for CI or #airflow-breeze >> channel if >> > > you >> > > > see breeze problems >> > > > >> > > > Your regular workflows should continue working as usual, you should >> see >> > > > just one workflow in CI running builds and tests instead of two. >> > > > >> > > > J. >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> > > For additional commands, e-mail: dev-h...@airflow.apache.org >> > > >> > > >> > >> >