potiuk commented on PR #35026: URL: https://github.com/apache/airflow/pull/35026#issuecomment-1772444407
> Refactoring the dockerfile to only COPY requirements first and frequently changed sources later is another good optimization technique. But i haven't looked at the dockerfile in detail to see if that is already done. There were likely 10s iterations or so of making the right sequence of COPY/RUN there and also - in the main stage the number of layers is minimised. But maybe some of it could be improved still - it's open to contributions :). The whole Dockerfile is now designed with the "incremental rebuild" in mind - it does selectively COPY only those things that are supposed to be used in the next step. I think what `--mount-cache` solves is that you really iterate over requirements.txt only. Similarly apt cache could be used (and is only useful) when someone experiments and iterates over apt dependencies Actually final `main` segment is really small and simple because it is optimized for production use - i.e. size. So what the final segment does (except ARG/ENV setting) literally three steps: * installing OS dependencies (runtime only - no build-essentials and dev libraries needed for compilation) * installling DB clients (parameterized with ARGs - only those that are needed) * copying installed airflow from `build` segment (we never install in the main segment - we always copy the installation from the build segment, this saves ~200 MB I believe we could add `--mount-cache` for all those steps - providing that we account for those cache validity problems. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
