lostluck commented on issue #28605: URL: https://github.com/apache/beam/issues/28605#issuecomment-1730455864
This seems like a Python specific issue, as it does multiple processes or multiple containers on a worker VM. The other SDKs (Go and Java at least) will only have a single boot cycle to to download from the artifact repo. It's hard to block race conditions across separate processes that can't meaningfully communicate. It's also hard to know if that semi-persist directory is actually shared or not between other workers. In more actionable commentary, that "don't download if the sha matches" is probably a good idea, since we can check for existing files, and if it exists with with a matching SHA, then it's the expected file. If not, it might be either wrong, or in progress of being downloaded.... Whomever works on this needs to take that into consideration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
