potiuk opened a new pull request #22492:
URL: https://github.com/apache/airflow/pull/22492
This change is one of the biggest optimizations to the Dockerfiles
that from the very beginning was a goal, but it has been enabled
by switching to buildkit and recent relase of support for
the 1.4 dockerfile syntax. This syntax introduced two features:
* heredocs
* links for COPY commands
Both changes allows to solve multiple problems:
* COPY for build scripts suffer from permission problems. Depending
on umask setting of the host, the scripts could have different
group permissions and invalidate docker cache. Inlining the
scripts (automatically by pre-commit) gets rid of the problem
completely
* COPY --link allows to optimize and parallelize builds for
Dockerfile.ci embedded source code. This should speed up
not only building the images locally but also it will allow
to use more efficiently cache for the CI builds (in case no
source code change, the builds will use pre-cached layers from
the cache more efficiently (and in parallel)
* The PROD Dockerfile is now completely standalone. You do not
need to have any folders or files to build Airlfow image. At
the same time the versatility and support for multiple ways
on how you can build the image (as described in
https://airflow.apache.org/docs/docker-stack/build.html is
maintained (this was a goal from the very beginning of the
PROD Dockerfile but it was not easily achievable - heredocs
allow to inline scripts that are used for the build and the
pre-commits will make sure that there is one source of truth
and nicely editable scripts for both PROD and CI Dockerfile.
The last point is really cool, because it allows our users to
build custom dockerfiles without checking out the code of
Airflow, it is enough to download the latest released
Dockerfile and they can easily build the image.
Overall - this change will vastly optimize build speed for
both PROD and CI images in multiple scenarios.
<!--
Thank you for contributing! Please make sure that your code changes
are covered with tests. And in case of new features or big changes
remember to adjust the documentation.
Feel free to ping committers for the review!
In case of existing issue, reference it using one of the following:
closes: #ISSUE
related: #ISSUE
How to write a good git commit message:
http://chris.beams.io/posts/git-commit/
-->
---
**^ Add meaningful description above**
Read the **[Pull Request
Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)**
for more information.
In case of fundamental code change, Airflow Improvement Proposal
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
is needed.
In case of a new dependency, check compliance with the [ASF 3rd Party
License Policy](https://www.apache.org/legal/resolved.html#category-x).
In case of backwards incompatible changes please leave a note in
[UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]