Thanks Jarek! Awesome to see this change, amazing work :)

On Wed, 1 Jan 2025, 12:35 Jarek Potiuk, <ja...@potiuk.com> wrote:

>
> *Update on the workflow refactor for our CI*
>
> *Tl;DR; We should have some good speed-ups, stability and simplifications
> in our CI. Read on if you are interested in details*
>
> Some first passes on the new workflows have been done and small "teething"
> problems addressed - so far so good.
>
> Once again kudos to Jacob (CC:) who developed and published the action in
> https://github.com/apache/infrastructure-actions/tree/main/stash - it's
> been extremely useful as better replacement of the github built-in "cache"
> action - with no size limitations, automated retention of cache and better
> support of cross-fork cache.
>
> We also worked - mostly with Pavan (kudos to all the PRs, help, reviews,
> and watchful eyes there!) on some follow-ups and there are some good things
> if you are interested in what's been improved.
>
> *Caching improvements for venvs and pre-commits and adding more `uv` *
>
> We doubled-down on "stash" action and have now quite comprehensive caching
> implemented in various stages of our builds. We used the "quiet" period
> around Xmas/New Year and tried to optimize various areas where build time
> could be improved.
>
> There are some interesting findings - It turned out that `uv` is SO FAST
> that in several cases using cache is SLOWER (installing breeze, creating
> k8s test environments). Simply the overhead to run the action, download and
> extract the cache from the storage is bigger than time for uv to resolve,
> pull and install the packages. So we do not use caching there, and
> appropriate comments are added.
> We reviewed all the places where `pip` was still used and applied "use-uv"
> in all the places it has not been used, and added caching where it made
> sense.
>
> This is mostly in https://github.com/apache/airflow/pull/45289
> Some numbers:
>
> * installing pre-commits now in most cases (when pre-commits are not
> modified) is down from 1m 35 ->  30s
> * installing k8s env (which is multiplied by a number of k8s tests we run
> in some PRS) is down from 1m -> 10s  - this is particularly visible in
> "canary runs" when we run now 32x variants of those tests :)
> * uv applied in a few missing  places decreased installation of some venvs
> to merely about 1 minute to few seconds.
>
> *Cache mount CI improvements*
>
> Wa also managed to come up with some "eat-cake and have it too" - where we
> had conflicting CI and local dev needs. We are now using  "--mount-cache"
> to keep uv cache and speed up local Breeze CI builds, but that caused CI
> builds to be generally slower 5 minutes 45s to build the image - because
> the uv had to reinstall airflow + 700+ dependencies from scratch.
>
> But with https://github.com/apache/airflow/pull/45314 we also used
> "stash" action to cache the `uv` cache in canary builds (and we can re-use
> the cache in fork PRs). This way we are down to a bit more than 3 minutes
> to build the image. While this is not as "fast" as the previous approach
> using GitHub Registry and pre-caching installation from "main", the fact
> that we have a single workflow and no "pull_request_target" and no "wait
> for images", and generally simplifying the approach makes it up altogether.
>
> So we got: 5m 45s -> 3m 10s ..
>
> The "local" experience with building "breeze" images should be much better
> overall - after the first local build of an image, subsequent rebuilds even
> with a lot of changes in dependencies should be way faster - generally
> rebuilding the breeze image once it has been built locally oonce should
> take seconds rather than minutes (this might be longer if we have new
> python patchlevel released - but it happens once every few weeks).
> Here is the build broken down to: downloading the uv cache, importing it,
> and building the image using it.
>
> [image: Screenshot 2025-01-01 at 10.07.04.png]
>
> If we had just the "build image" step without caching - it would be ~ 5m
> 45 seconds - so we save almost 50%  of CI image build time per python
> version per run. That's also about 3 minutes shorter feedback time - less
> waiting.
>
> *Reproducing CI failures locally*
>
> Thanks to Pavan's https://github.com/apache/airflow/pull/45287 (further
> improved by https://github.com/apache/airflow/pull/45296 and
> https://github.com/apache/airflow/pull/45324 ) we have an even easier way
> now to reproduce the CI builds (if you have an AMD machine for now - that
> will change in the future when we have ARC enabled and ARM builds in CI).
>
> Whenever you want to locally reproduce a failure in CI you should be able
> to specify PR# that you want to "reproduce failure of";
>
> breeze ci-image load --from-pr 12345 --python 3.9 --github-token TOKEN
>
> Similarly when you want to reproduce a specific run failure:
>
> breeze ci-image load --from-run 12538475388 --python 3.9 --github-token
> TOKEN
>
> After that, your local image will be exactly the same as the one in CI -
> with this command, you can be dropped into breeze shell and do some testing
> there, without switching to the branch of the PR
>
> breeze shell --mount-sources skip [OTHER OPTIONS]
>
> If you check-out the branch of the PR that was used, regular ``breeze``
> commands will also reproduce the CI environment without having to rebuild
> the image - for example when dependencies changed or when new dependencies
> were released and used in the CI job - and you will be able to edit source
> files locally as usual.
> This is nicely described in our docs in a few relevant places in our docs
> - mainly here:
> https://github.com/apache/airflow/blob/main/dev/breeze/doc/ci/07_running_ci_locally.md
> - I updated the docs and refreshed it, and it should be easier to
> understand and follow it.
>
> Jarek and Pavan
>
>
>
> On Mon, Dec 30, 2024 at 7:17 AM Amogh Desai <amoghdesai....@gmail.com>
> wrote:
>
>> Thanks Jarek for simplifying the workflows and thanks for the announcement
>> too, or contributors would probably be pretty lost
>> if something strange happened.
>>
>> Thanks & Regards,
>> Amogh Desai
>>
>>
>> On Mon, Dec 30, 2024 at 11:26 AM Vishnu Chilukoori <
>> vish.chiluko...@gmail.com> wrote:
>>
>> > Thanks Jarek, great work on simplifying and securing CI workflows!
>> >
>> >
>> > --
>> > Regards,
>> > Vishnu Chilukoori
>> >
>> > On Sun, Dec 29, 2024 at 2:57 PM Pavankumar Gopidesu <
>> > gopidesupa...@gmail.com>
>> > wrote:
>> >
>> > > Woohooo Thanks Jarek Great work :)
>> > >
>> > > Regards,
>> > > Pavan
>> > >
>> > > On Sun, Dec 29, 2024 at 10:15 PM Jarek Potiuk <ja...@potiuk.com>
>> wrote:
>> > > >
>> > > > Hello here,
>> > > >
>> > > > TL;DR; I just merged https://github.com/apache/airflow/pull/45266 -
>> > > > which implemented a much simplified and nicer workflow for our CI.
>> > > >
>> > > > Rebase to the latest `main` and you should be good to go.
>> > > >
>> > > > It (finally) switches o from a workflow we had for years (using
>> pretty
>> > > > dangerous from the security point of view `pull_request_target`
>> > > workflow) -
>> > > > into using Artifacts for sharing images in workflow. This was
>> possible
>> > > > thanks to new "artifacts" actions and switching to UV.
>> > > >
>> > > > The benefit of it is that it is way safer - no more "dangerous
>> > workflows"
>> > > > and simpler - we have a lot simpler Dockerfile.ci and caching
>> mechanism
>> > > > implemented. We worked this out by discussing with other ASF
>> projects
>> > and
>> > > > actually even reusing an action developed by a fellow Apache Arrow
>> > > > committer and PMC member - Jacob Wujciak.
>> > > >
>> > > > The things everyone should do:
>> > > >
>> > > > * rebase your PR to latest main to make your PRs rebuilt using the
>> new
>> > > > workflow
>> > > > * run `breeze ci-image build` if you are using breeze locally
>> > > >
>> > > > I expect some teething problems, so do not hesitate to raise your
>> > > problems
>> > > > in #internal-airflow-ci-cd channel for CI or #airflow-breeze
>> channel if
>> > > you
>> > > > see breeze problems
>> > > >
>> > > > Your regular workflows should continue working as usual, you should
>> see
>> > > > just one workflow in CI running builds and tests instead of two.
>> > > >
>> > > > J.
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> > > For additional commands, e-mail: dev-h...@airflow.apache.org
>> > >
>> > >
>> >
>>
>

Reply via email to