potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1009731442


   I Look for some reviews. As part of the optimization I also reviewed the 
image with Dive (cc: @malthe @mik-laj ) and made sure that some of the remainig 
remnants that were "bloating" the image were removed
   
   * we had (unnecesary) PIP install in the final image - this caused (a small) 
number of .pyc files to be embedded in the image
   * we also had a lastlog produced during apt installl which had 15MB - I made 
sure it is removed as the last step of the RUN instruction that created it 
(thanks @malthe for pointing that out!).
   * I also reviewed and improved the instructions which copied the .local 
folder and performed permission  - one of the problems noted in #20776 that  
there was no "group write" permission for the home directory of Airflow (which 
could be problematic in some open-shift cases). It had to be done carefully - 
changing of the permissions has to be done in the right place bacause changing 
the permission after the files are stored as layer effectively duplicates the 
layer (the new layer with pemissions creates effectively a copy o all the 
files) 😱 
   
   As result the efficiency score of our image jumped from 97% to 99%:
   
   ![Screenshot 2022-01-11 02 08 
24](https://user-images.githubusercontent.com/595491/148911786-e520beb2-2dec-44d0-91b1-89f2acd377a2.png)
   
   I am thinking about adding some more automated tests for the presence of 
unwanted files and automating the tests for the image "efficiency" in our CI, 
but I would like to do it after this one and #20258  as switching to buildx 
significantly improves the experience of iterating over the images and building 
them in small increments. 
   
   Looking forward to reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to