potiuk commented on PR #35026:
URL: https://github.com/apache/airflow/pull/35026#issuecomment-1772444407

   > Refactoring the dockerfile to only COPY requirements first and frequently 
changed sources later is another good optimization technique. But i haven't 
looked at the dockerfile in detail to see if that is already done.
   
   There were likely 10s  iterations or so of making the right sequence of 
COPY/RUN there and also - in the  main stage the number of layers is minimised. 
But maybe some of it could be improved still - it's open to contributions :).  
   
   The whole Dockerfile is now designed with the "incremental rebuild" in mind 
- it does selectively COPY only those things that are supposed to be used in 
the next step. I think what `--mount-cache` solves is that you really iterate 
over requirements.txt only.  Similarly apt cache could be used (and is only 
useful) when someone experiments and iterates over apt dependencies 
   
   Actually final `main` segment is really small and simple because it is 
optimized for production use - i.e. size. So what the final segment does  
(except ARG/ENV setting) literally three steps:
   
   * installing OS dependencies (runtime only - no build-essentials and dev 
libraries needed for compilation)
   * installling DB clients (parameterized with ARGs - only those that are 
needed)
   * copying installed airflow from `build` segment (we never install in the 
main segment - we always copy the installation from the build segment, this 
saves ~200 MB
   
   I believe we could add  `--mount-cache`  for all those steps - providing 
that we account for those cache validity  problems.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to