tvalentyn commented on pull request #13399:
URL: https://github.com/apache/beam/pull/13399#issuecomment-736915744


   > > Thanks. It seems that caching may improve the startup time, and be 
useful for users who frequently launch the same pipeline. However I think 
caching may result in a difference in behavior. Questions:
   > > 
   > > 1. Is it possible that caching will result in a stale image that users 
will perceive as undesirable and the behavior will be difficult to debug to 
users or support folks? For example, if a user pipeline depends on a latest 
version of a dependency X in pypi. Perhaps a dependency they control. They have 
a pipeline with a setup.py that has an open install_requires bound dep>=1.0.0 < 
2. They run the pipeline, then push dependency to pypi and run the pipeline 
again, expecting a change in behavior. Kaniko will not rebuild the image in 
this case, right? What are your thoughts on that?
   > 
   > I think kaniko cache works the same way as docker layer cache, that is to 
say, if the locally downloaded artifacts changed(or requirements.txt, setup.py 
changed) it will actually change the COPY step in the prebuilding workflow. 
There will be no valid cache layer since the artifacts copy step and a new 
image will be rebuilt. (also verified through my own experiment with changing 
requirements.txt)
   
   Thanks for checking. Sounds similar to Docker build cache 
https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache.
   
   Transitive dependencies not present in requirements.txt may not be updated, 
but it would be better to list them in requirements.txt anyway to avoid 
pickling mismatches on the worker, as mentioned in 
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/.
   
   > 
   > > 1. During runtime with prebuilding workflow enabled, how visible is it 
to the user that the cached layers are reused and not rebuilt?
   > 
   > There will be log entries "No cached layer found for cmd ..." in the cloud 
build log.
   
   Would it be possible to mention that caching is used when it is rather when 
it is not? Or perhaps add a  generic info message along the lines of: `Staging 
pipeline dependencies into a prebuilt container image. To optimize build time, 
build steps will be cached.`
   
   Also do you know if a user can tell Kaniko to clean the cache manually?
   
   > 
   > > 1. I think we should document the prebuilding feature in Beam docs, and 
reflect the caching behavior and associated TTLs. What is a plan for that?
   > 
   > I do believe Emily will be working on documenting this as part of the 
custom container next quarter and I can also help.
   > 
   > > 1. Would customizing the TTL or adding a no-cache option make sense? We 
are using default 2 weeks TTL, right? See: 
[cloud.google.com/cloud-build/docs/kaniko-cache#configuring_the_cache_expiration_time](https://cloud.google.com/cloud-build/docs/kaniko-cache#configuring_the_cache_expiration_time).
   > 
   > I think default value makes sense, I didn't want to provide too many knobs 
to users since it may become more confusing or rarely used, but we can always 
provide additional flags for more advanced user to control it.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to