phoerious commented on a change in pull request #16658:
URL: https://github.com/apache/beam/pull/16658#discussion_r795762322
##########
File path: sdks/python/container/boot.go
##########
@@ -172,12 +186,12 @@ func main() {
}
}
- workerPoolId := os.Getenv(workerPoolIdEnv)
if workerPoolId != "" {
- multiProcessExactlyOnce(materializeArtifactsFunc,
"beam.install.complete."+workerPoolId)
+ defer multiProcessExactlyOnce(materializeArtifactsFunc,
"beam.install.complete." + workerPoolId)()
} else {
materializeArtifactsFunc()
}
+ defer os.RemoveAll(venvDir)
Review comment:
Cleanup is deferred until all sibling processes have finished, but I
don't know what that means for the entire job/pipeline life cycle. This could
potentially blow up if multiple independent worker processes are running in
parallel.
##########
File path: sdks/python/container/boot.go
##########
@@ -145,7 +145,21 @@ func main() {
// Guard from concurrent artifact retrieval and installation,
// when called by child processes in a worker pool.
+ workerPoolId := os.Getenv(workerPoolIdEnv)
+ var venvDir string
+ if workerPoolId != "" {
+ venvDir = filepath.Join(*semiPersistDir, "beam-venv",
"beam-pool-" + workerPoolId)
+ } else {
+ venvDir = filepath.Join(*semiPersistDir, "beam-venv",
"beam-worker-" + *id)
+ }
Review comment:
As mentioned above, this is super strange. I would much prefer to have a
job ID here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]