y1chi edited a comment on pull request #13399:
URL: https://github.com/apache/beam/pull/13399#issuecomment-736154645


   
   
   > Thanks. It seems that caching may improve the startup time, and be useful 
for users who frequently launch the same pipeline. However I think caching may 
result in a difference in behavior. Questions:
   > 
   > 1. Is it possible that caching will result in a stale image that users 
will perceive as undesirable and the behavior will be difficult to debug to 
users or support folks? For example, if a user pipeline depends on a latest 
version of a dependency X in pypi. Perhaps a dependency they control. They have 
a pipeline with a setup.py that has an open install_requires bound dep>=1.0.0 < 
2. They run the pipeline, then push dependency to pypi and run the pipeline 
again, expecting a change in behavior. Kaniko will not rebuild the image in 
this case, right? What are your thoughts on that?
   
   I think kaniko cache works the same way as docker layer cache, that is to 
say, if the locally downloaded artifacts changed(or requirements.txt, setup.py 
changed) it will actually change the COPY step in the prebuilding workflow. 
There will be no valid cache layer since the artifacts copy step and a new 
image will be rebuilt. (also verified through my own experiment with changing 
requirements.txt)
   
   > 2. During runtime with prebuilding workflow enabled, how visible is it to 
the user that the cached layers are reused and not rebuilt?
   
   There will be log entries "No cached layer found for cmd ..." in the cloud 
build log.
   
   > 3. I think we should document the prebuilding feature in Beam docs, and 
reflect the caching behavior and associated TTLs. What is a plan for that?
   
   I do believe Emily will be working on documenting this as part of the custom 
container next quarter and I can also help.
   
   > 4. Would customizing the TTL or adding a no-cache option make sense? We 
are using default 2 weeks TTL, right? See: 
https://cloud.google.com/cloud-build/docs/kaniko-cache#configuring_the_cache_expiration_time.
   
   I think default value makes sense, I didn't want to provide too many knobs 
to users since it may become more confusing or rarely used, but we can always 
provide additional flags for more advanced user to control it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to