Re: [PR] feat(dockerfile): Add pip caching for faster build [airflow]

via GitHub Thu, 19 Oct 2023 05:31:58 -0700


potiuk commented on PR #35026:
URL: https://github.com/apache/airflow/pull/35026#issuecomment-1770882181


   Looking at the discussion @V0lantis @hussein-awala and thinking a bit  - 
yeah, I think it's actually quite fine to use the local cache (even for pip 
installation)  as long as it does not interfere with any of the caching 
mechanisms of ours and ability to build different images (especially for 
multiple python versions).
   
   As long as the remote cache works as it did (it should), aadding those local 
caches should not impact the build. And possibly can speed up some local build 
cases. 
   
   I have one worry though. Knowing that "cache invalidation is the most 
difficult thing". I am not 100% sure how it will work - how does the local 
cache gets allocated when you have differnet arguments passed. Do you know how 
the local cache is determined in case we change parameters/arguments and use 
the same Dockerfile?
   
   To be honest I am quite worried about the case where base python changes 
(see my previous comment), about this case:
   
   ```
   docker build --build-arg PYTHON_BASE_IMAGE=python:3.11-slim-bullseye ... 
Dockerfile
   ```
   
   Then (note Python version change):
   
   ```
   docker build --build-arg PYTHON_BASE_IMAGE=python:3.10-slim-bullseye ... 
Dockerfile
   ```
   
   And about the case where `pip` version changes.  Will the `pip` cache be 
usable in this case? Will they interfere? Do we have to have mechanism to 
invalidate the cache in this case? 
   
   I am asking all those questions, because if any problem like that happens, 
someone who will have it, will inevitably open isssue in our repo and will 
(rightfully) complain and ask us what to do.
   
   I do recall cases in the past that `pip` installation broke when the same 
cache has been used for different python version s - including different Python 
patchlevel versions. I recall CI failure after Python upgrade from 3.8.x to 
3.8.(x +1). That was of course extreme case, but this might happen. I also 
recall there were some issues when new releases of `pip` introduced some 
breaking changes or caused irrecoverable errors when cache gets broken - 
example case here https://github.com/pypa/pip/issues/11985 . Even very recently 
`pip` has changed the format of their cache storage - see 23.3 release notes 
(https://pip.pypa.io/en/stable/news/). This new format is kept in a different 
directory - so it should be safe to upgrade, but still that makes me a little 
worried.
   
   The main reason why we have `--no-cache` there is that I did not want to to 
worry about all those cases and scenarios.
   
   Curious about your thoughts on whether this is something we should worry 
about ? Maybe cache location be dependent on Python version and PIP_VERSION to 
mitigate those simply ? Maybe we should have a way to clear the cache when 
things get broken (other than nuking whole docker installation). And finally 
possibly we should document it somewhere what to do when it gets broken.
   
   WDYT?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(dockerfile): Add pip caching for faster build [airflow]

Reply via email to