tvalentyn commented on a change in pull request #16658:
URL: https://github.com/apache/beam/pull/16658#discussion_r810426468
##########
File path: sdks/python/container/boot.go
##########
@@ -145,7 +145,21 @@ func main() {
// Guard from concurrent artifact retrieval and installation,
// when called by child processes in a worker pool.
+ workerPoolId := os.Getenv(workerPoolIdEnv)
+ var venvDir string
+ if workerPoolId != "" {
+ venvDir = filepath.Join(*semiPersistDir, "beam-venv",
"beam-pool-" + workerPoolId)
+ } else {
+ venvDir = filepath.Join(*semiPersistDir, "beam-venv",
"beam-worker-" + *id)
Review comment:
> It's not clear why there wouldn't be a workerPoolId. Maybe add a
comment.
I think the difference here is how the boot.go code is executed in various
execution modes that were added for various runners, for example on
PortableRunner+Flink Cluster vs Dataflow.
The worker pool logic was initially added for for portable runner
(https://github.com/apache/beam/pull/9371), and a more recent change here (that
is not gained significant usage yet) is
https://github.com/apache/beam/pull/15642. Different environment variables /
params may be available in different execution mode and we will need to make
sure this works cleanly for all scenarios.
I am familiar with one of at least the 3 branches so catching up now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]