[
https://issues.apache.org/jira/browse/BEAM-14449?focusedWorklogId=775259&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-775259
]
ASF GitHub Bot logged work on BEAM-14449:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 26/May/22 22:10
Start Date: 26/May/22 22:10
Worklog Time Spent: 10m
Work Description: rohdesamuel commented on code in PR #17736:
URL: https://github.com/apache/beam/pull/17736#discussion_r883119026
##########
sdks/python/apache_beam/runners/interactive/interactive_beam.py:
##########
@@ -418,15 +424,19 @@ def create(
raise ValueError(
'Unknown cluster identifier: %s. Cannot create or reuse'
'a Dataproc cluster.')
- elif cluster_metadata.region == 'global':
- # The global region is unsupported as it will be eventually deprecated.
- raise ValueError('Clusters in the global region are not supported.')
- elif not cluster_metadata.region:
+ if not cluster_metadata.region:
_LOGGER.info(
'No region information was detected, defaulting Dataproc cluster '
'region to: us-central1.')
cluster_metadata.region = 'us-central1'
+ elif cluster_metadata.region == 'global':
+ # The global region is unsupported as it will be eventually deprecated.
+ raise ValueError('Clusters in the global region are not supported.')
# else use the provided region.
+ if cluster_metadata.num_workers and cluster_metadata.num_workers < 2:
Review Comment:
Why are two workers required? Also can you please make the magic number a
constant?
##########
sdks/python/apache_beam/runners/interactive/dataproc/types.py:
##########
@@ -50,5 +79,8 @@ def __eq__(self, other):
return self.__key() == other.__key()
return False
+ def rename(self):
Review Comment:
Somewhat of an awkward name for something that sets the cluster name to the
default. Maybe "reset_name" or something similar would be better?
##########
sdks/python/apache_beam/runners/interactive/interactive_beam.py:
##########
@@ -475,7 +485,9 @@ def cleanup(
'options is deprecated since First stable release. References to '
'<pipeline>.options will not be supported',
category=DeprecationWarning)
- p.options.view_as(FlinkRunnerOptions).flink_master = '[auto]'
+ p_flink_options = p.options.view_as(FlinkRunnerOptions)
+ p_flink_options.flink_master = '[auto]'
+ p_flink_options.flink_version = None
Review Comment:
Why do we reset the flink version here?
Issue Time Tracking
-------------------
Worklog Id: (was: 775259)
Time Spent: 1h (was: 50m)
> Support cluster provisioning when using Flink on Dataproc
> ---------------------------------------------------------
>
> Key: BEAM-14449
> URL: https://issues.apache.org/jira/browse/BEAM-14449
> Project: Beam
> Issue Type: Improvement
> Components: runner-py-interactive
> Reporter: Ning
> Assignee: Ning
> Priority: P2
> Attachments: image-2022-05-16-11-25-32-904.png,
> image-2022-05-16-11-28-12-702.png
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Provide the capability for the user to explicitly provision a cluster.
> Current implementation provisions each cluster at the location specified by
> GoogleCloudOptions using 3 worker nodes. There is no explicit API to
> configure the number or shape of workers.
> We could use the WorkerOptions to allow customers to explicitly provision a
> cluster and expose an explicit API (with UX in notebook extension) for
> customers to change the size of a cluster connected with their notebook
> (until we have an auto scaling solution with Dataproc for Flink).
> The API looks like this when configuring the workers for a dataproc cluster
> when creating it:
> !image-2022-05-16-11-25-32-904.png!
> An example request setting the masterConfig and workerConfig:
> !image-2022-05-16-11-28-12-702.png!
--
This message was sent by Atlassian Jira
(v8.20.7#820007)