Ning created BEAM-14332:
---------------------------

             Summary: Improve the workflow of cluster management for Flink on 
Dataproc
                 Key: BEAM-14332
                 URL: https://issues.apache.org/jira/browse/BEAM-14332
             Project: Beam
          Issue Type: Improvement
          Components: runner-py-interactive
            Reporter: Ning
            Assignee: Ning


Improve the workflow of cluster management.

There is an option to configure a default [cluster 
name|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_beam.py#L366].
 The existing user flows are:
 # Use the default cluster name to create a new cluster if none is in use;
 # Reuse a created cluster that has the default cluster name;
 # If the default cluster name is configured to a new value, re-apply 1 and 2.

 A better solution is to 
 # Create a new cluster implicitly if there is none or explicitly if the user 
wants one with specific provisioning;
 # Always default to using the last created cluster.

 The reasons are:
 * Cluster name is meaningless to the user when a cluster is just a medium to 
run OSS runners (as applications) such as Flink or Spark. The cluster could 
also be running anywhere (on GCP) such as Dataproc, k8s, or even Dataflow 
itself.
 * Clusters should be uniquely identified, thus should always have a distinct 
name. Clusters are managed (created/reused/deleted) behind the scenes by the 
notebook runtime when the user doesn’t explicitly do so (the capability to 
explicitly manage clusters is still available). Reusing the same default 
cluster name is risky when a cluster is deleted by one notebook runtime while 
another cluster with the same name is created by a different notebook runtime. 

 * Provide the capability for the user to explicitly provision a cluster.

Current implementation provisions each cluster at the location specified by 
GoogleCloudOptions using 3 worker nodes. There is no explicit API to configure 
the number or shape of workers.

We could use the WorkerOptions to allow customers to explicitly provision a 
cluster and expose an explicit API (with UX in notebook extension) for 
customers to change the size of a cluster connected with their notebook (until 
we have an auto scaling solution with Dataproc for Flink).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to