[jira] [Created] (BEAM-14332) Improve the workflow of cluster management for Flink on Dataproc

Ning (Jira) Tue, 19 Apr 2022 15:25:05 -0700

Ning created BEAM-14332:
---------------------------

             Summary: Improve the workflow of cluster management for Flink on 
Dataproc
                 Key: BEAM-14332
                 URL: https://issues.apache.org/jira/browse/BEAM-14332
             Project: Beam
          Issue Type: Improvement
          Components: runner-py-interactive
            Reporter: Ning
            Assignee: Ning

Improve the workflow of cluster management.

There is an option to configure a default [cluster
name|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_beam.py#L366].
The existing user flows are:
# Use the default cluster name to create a new cluster if none is in use;
# Reuse a created cluster that has the default cluster name;
# If the default cluster name is configured to a new value, re-apply 1 and 2.

A better solution is to
# Create a new cluster implicitly if there is none or explicitly if the user
wants one with specific provisioning;
# Always default to using the last created cluster.

The reasons are:
* Cluster name is meaningless to the user when a cluster is just a medium to
run OSS runners (as applications) such as Flink or Spark. The cluster could
also be running anywhere (on GCP) such as Dataproc, k8s, or even Dataflow
itself.
* Clusters should be uniquely identified, thus should always have a distinct
name. Clusters are managed (created/reused/deleted) behind the scenes by the
notebook runtime when the user doesn’t explicitly do so (the capability to
explicitly manage clusters is still available). Reusing the same default
cluster name is risky when a cluster is deleted by one notebook runtime while
another cluster with the same name is created by a different notebook runtime.

* Provide the capability for the user to explicitly provision a cluster.

Current implementation provisions each cluster at the location specified by
GoogleCloudOptions using 3 worker nodes. There is no explicit API to configure
the number or shape of workers.

We could use the WorkerOptions to allow customers to explicitly provision a
cluster and expose an explicit API (with UX in notebook extension) for
customers to change the size of a cluster connected with their notebook (until
we have an auto scaling solution with Dataproc for Flink).

--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (BEAM-14332) Improve the workflow of cluster management for Flink on Dataproc

Reply via email to