tvalentyn commented on issue #29663: URL: https://github.com/apache/beam/issues/29663#issuecomment-1845980486
`semiPersistDir` should be set by the runner when runner lauches SDK harness container. It might be configurable already, based on some references in codebase: https://github.com/apache/beam/blob/90dd93f5241284da2e49c818af03e98b5132d30a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L2746 Would setting `semiPersistDir` to `spark.local.dir` work for all spark runner users, so that users don't have to worry about this knob? given that venv in semipersist dir doesn't work for dataflow, we could detect if dataflow is used as a special case and if so, set RUN_PYTHON_SDK_IN_DEFAULT_ENVIRONMENT. The extra venv wrap doesn't make much sense for dataflow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
