[ 
https://issues.apache.org/jira/browse/BEAM-13812?focusedWorklogId=727940&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-727940
 ]

ASF GitHub Bot logged work on BEAM-13812:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Feb/22 00:10
            Start Date: 16/Feb/22 00:10
    Worklog Time Spent: 10m 
      Work Description: VictorPlusC commented on a change in pull request 
#16741:
URL: https://github.com/apache/beam/pull/16741#discussion_r807423696



##########
File path: 
sdks/python/apache_beam/runners/interactive/dataproc/dataproc_cluster_manager.py
##########
@@ -124,10 +131,23 @@ def create_flink_cluster(self) -> None:
         'config': {
             'software_config': {
                 'optional_components': ['DOCKER', 'FLINK']
+            },
+            'gce_cluster_config': {
+                'metadata': {
+                    'flink-start-yarn-session': 'true'
+                },
+                'service_account_scopes': [
+                    'https://www.googleapis.com/auth/cloud-platform'
+                ]
+            },
+            'endpoint_config': {
+                'enable_http_port_access': True
             }
         }
     }
     self.create_cluster(cluster)
+    self.master_url = self.get_master_url(
+        self.master_url_identifier, default=False)

Review comment:
       Quite a few things have changed with the PR, so here's the updated 
answer:
   
   The master_url and identifier (which we're calling 'cluster_metadata') are 
used for the following use cases:
   - During pipeline runtime, when we use the FlinkRunner, we'll detect some 
information (cluster default name from the Interactive Environment, project and 
region from the pipeline options). We can use this information to build the 
identifier and check if it corresponds to any existing master_urls via the 
bidirectional mapping. If it does, then we can simply skip the cluster creation 
stage and begin using the cluster.
   - We now instantiate instances of DataprocClusterManager with the 
identifier, and if the cluster_metadata corresponds to some existing 
master_url, we can assign the value to the DataprocClusterManager immediately 
during instantiation without having to search for it again. We have this 
functionality in place since we want to have each pipeline mapped to a separate 
instance of DataprocClusterManager.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 727940)
    Time Spent: 11h 40m  (was: 11.5h)

> Integrate DataprocClusterManager into Interactive Environment
> -------------------------------------------------------------
>
>                 Key: BEAM-13812
>                 URL: https://issues.apache.org/jira/browse/BEAM-13812
>             Project: Beam
>          Issue Type: New Feature
>          Components: runner-py-interactive
>            Reporter: Victor Chen
>            Assignee: Victor Chen
>            Priority: P2
>          Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> * Adds code to automatically create a Dataproc cluster configured with Docker 
> and Flink if a user instantiates an InteractiveRunner using 
> underlying_runner='FlinkRunner'
>  * By default, the created Dataproc cluster will be called 
> 'interactive-beam-cluster' and will be cleaned up when the Interactive 
> Environment is interrupted
>  * Users will be able to toggle an automatic deletion of the cluster with the 
> default name, 'interactive-beam-cluster' when the kernel is interrupted
>  * Introduces the Clusters class in interactive_beam.py, which acts as a user 
> interface that provides the user with the following capabilities:
>  ** Changing the default name of the clusters created by Interactive Beam
>  ** Cleaning up a cluster by providing a corresponding pipeline, an option to 
> force the deletion is provided, in the case where more than one pipeline use 
> the same cluster
>  ** Getting all pipelines that use the provided master_url
>  ** Receiving a description back for either the cluster corresponding to a 
> specific pipeline, or the descriptions of all pipelines



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to