lostluck commented on issue #28605:
URL: https://github.com/apache/beam/issues/28605#issuecomment-1730455864

   This seems like a Python specific issue, as it does multiple processes or 
multiple containers on a worker VM. The other SDKs (Go and Java at least) will 
only have a single boot cycle to to download from the artifact repo. It's hard 
to block race conditions across separate processes that can't meaningfully 
communicate.
   
   It's also hard to know if that semi-persist directory is actually shared or 
not between other workers.
   
   In more actionable commentary, that "don't download if the sha matches" is 
probably a good idea, since we can check for existing files, and if it exists 
with with a matching SHA, then it's the expected file.
   If not, it might be either wrong, or in progress of being downloaded....
   
   Whomever works on this needs to take that into consideration.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to