[jira] [Work logged] (BEAM-12792) Multiple jobs running on Flink session cluster reuse the persistent Python environment.

ASF GitHub Bot (Jira) Fri, 18 Feb 2022 17:21:21 -0800


     [ 
https://issues.apache.org/jira/browse/BEAM-12792?focusedWorklogId=729952&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-729952
 ]


ASF GitHub Bot logged work on BEAM-12792:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Feb/22 01:20
            Start Date: 19/Feb/22 01:20
    Worklog Time Spent: 10m 
      Work Description: tvalentyn commented on a change in pull request #16658:
URL: https://github.com/apache/beam/pull/16658#discussion_r810426468



##########
File path: sdks/python/container/boot.go
##########
@@ -145,7 +145,21 @@ func main() {
        // Guard from concurrent artifact retrieval and installation,
        // when called by child processes in a worker pool.
 
+       workerPoolId := os.Getenv(workerPoolIdEnv)
+       var venvDir string
+       if workerPoolId != "" {
+               venvDir = filepath.Join(*semiPersistDir, "beam-venv", 
"beam-pool-" + workerPoolId)
+       } else {
+               venvDir = filepath.Join(*semiPersistDir, "beam-venv", 
"beam-worker-" + *id)

Review comment:
       > It's not clear why there wouldn't be a workerPoolId. Maybe add a 
comment.
   
   I think the difference here is how the boot.go code is executed in various 
execution modes that were added for various runners, for example on 
PortableRunner+Flink Cluster vs Dataflow.
   
   The worker pool logic was initially added for for portable runner 
(https://github.com/apache/beam/pull/9371), and a more recent change here (that 
is not gained significant usage yet) is 
https://github.com/apache/beam/pull/15642. Different environment variables / 
params may be available in different execution mode and we will need to make 
sure this works cleanly for all scenarios.
   
   I am familiar with one of at least the 3 branches so caching up now.
    




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 729952)
    Time Spent: 4h  (was: 3h 50m)

> Multiple jobs running on Flink session cluster reuse the persistent Python 
> environment.
> ---------------------------------------------------------------------------------------
>
>                 Key: BEAM-12792
>                 URL: https://issues.apache.org/jira/browse/BEAM-12792
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-harness
>    Affects Versions: 2.27.0, 2.28.0, 2.29.0, 2.30.0, 2.31.0
>         Environment: Kubernetes 1.20 on Ubuntu 18.04.
>            Reporter: Jens Wiren
>            Priority: P1
>              Labels: FlinkRunner, beam
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> I'm running TFX pipelines on a Flink cluster using Beam in k8s. However, 
> extra python packages passed to the Flink runner (or rather beam worker 
> side-car) are only installed once per deployment cycle. Example:
>  # Flink is deployed and is up and running
>  # A TFX pipeline starts, submits a job to Flink along with a python whl of 
> custom code and beam ops.
>  # The beam worker installs the package and the pipeline finishes succesfully.
>  # A new TFX pipeline is build where a new beam fn is introduced, the pipline 
> is started and the new whl is submitted as in step 2).
>  # This time, the new package is not being installed in the beam worker 
> causing the job to fail due to a reference which does not exist in the beam 
> worker, since it didn't install the new package.
>  
> I started using Flink from beam version 2.27 and it has been an issue all the 
> time.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (BEAM-12792) Multiple jobs running on Flink session cluster reuse the persistent Python environment.

Reply via email to