[
https://issues.apache.org/jira/browse/BEAM-13812?focusedWorklogId=729174&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-729174
]
ASF GitHub Bot logged work on BEAM-13812:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 17/Feb/22 19:09
Start Date: 17/Feb/22 19:09
Worklog Time Spent: 10m
Work Description: VictorPlusC commented on a change in pull request
#16741:
URL: https://github.com/apache/beam/pull/16741#discussion_r809379319
##########
File path:
sdks/python/apache_beam/runners/interactive/dataproc/dataproc_cluster_manager.py
##########
@@ -18,60 +18,71 @@
# pytype: skip-file
import logging
+from dataclasses import dataclass
from typing import Optional
-try:
- from google.cloud import dataproc_v1
-except ImportError:
- raise ImportError(
- 'Google Cloud Dataproc not supported for this execution environment.')
+from apache_beam.runners.interactive import interactive_environment as ie
_LOGGER = logging.getLogger(__name__)
+@dataclass
+class MasterURLIdentifier:
+ project_id: Optional[str] = None
+ region: Optional[str] = None
+ cluster_name: Optional[str] = None
+
+ def __key(self):
+ return (self.project_id, self.region, self.cluster_name)
+
+ def __hash__(self):
+ return hash(self.__key())
+
+ def __eq__(self, other):
+ if isinstance(other, MasterURLIdentifier):
+ return self.__key() == other.__key()
+ raise NotImplementedError(
+ 'Comparisons are only supported between '
+ 'instances of MasterURLIdentifier.')
+
+
class DataprocClusterManager:
"""The DataprocClusterManager object simplifies the operations
required for creating and deleting Dataproc clusters for use
under Interactive Beam.
"""
- DEFAULT_NAME = 'interactive-beam-cluster'
-
- def __init__(
- self,
- project_id: Optional[str] = None,
- region: Optional[str] = None,
- cluster_name: Optional[str] = None) -> None:
+ def __init__(self, cluster_metadata: MasterURLIdentifier) -> None:
"""Initializes the DataprocClusterManager with properties required
to interface with the Dataproc ClusterControllerClient.
"""
-
- self._project_id = project_id
- if region == 'global':
+ self.cluster_metadata = cluster_metadata
+ if self.cluster_metadata.region == 'global':
# The global region is unsupported as it will be eventually deprecated.
raise ValueError('Clusters in the global region are not supported.')
- elif region:
- self._region = region
- else:
+ elif not self.cluster_metadata.region:
_LOGGER.warning(
'No region information was detected, defaulting Dataproc cluster '
'region to: us-central1.')
- self._region = 'us-central1'
+ self.cluster_metadata.region = 'us-central1'
- if cluster_name:
- _LOGGER.warning(
- 'A user-specified cluster_name has been detected. '
- 'Please note that you will have to manually delete the Dataproc '
- 'cluster that will be created under the name: %s',
- cluster_name)
- self._cluster_name = cluster_name
- else:
- self._cluster_name = self.DEFAULT_NAME
+ if not self.cluster_metadata.cluster_name:
+ self.cluster_metadata.cluster_name = ie.current_env(
+ ).clusters.default_cluster_name
+ from google.cloud import dataproc_v1
self._cluster_client = dataproc_v1.ClusterControllerClient(
client_options={
- 'api_endpoint': f'{self._region}-dataproc.googleapis.com:443'
+ 'api_endpoint':
f'{self.cluster_metadata.region}-dataproc.googleapis.com:443'
})
+ if self.cluster_metadata in ie.current_env().clusters.master_urls.inverse:
+ self.master_url = self.get_master_url(self.cluster_metadata)
Review comment:
Sounds good, I've updated the code now. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 729174)
Time Spent: 19h 50m (was: 19h 40m)
> Integrate DataprocClusterManager into Interactive Environment
> -------------------------------------------------------------
>
> Key: BEAM-13812
> URL: https://issues.apache.org/jira/browse/BEAM-13812
> Project: Beam
> Issue Type: New Feature
> Components: runner-py-interactive
> Reporter: Victor Chen
> Assignee: Victor Chen
> Priority: P2
> Time Spent: 19h 50m
> Remaining Estimate: 0h
>
> * Adds code to automatically create a Dataproc cluster configured with Docker
> and Flink if a user instantiates an InteractiveRunner using
> underlying_runner='FlinkRunner'
> * By default, the created Dataproc cluster will be called
> 'interactive-beam-cluster' and will be cleaned up when the Interactive
> Environment is interrupted
> * Users will be able to toggle an automatic deletion of the cluster with the
> default name, 'interactive-beam-cluster' when the kernel is interrupted
> * Introduces the Clusters class in interactive_beam.py, which acts as a user
> interface that provides the user with the following capabilities:
> ** Changing the default name of the clusters created by Interactive Beam
> ** Cleaning up a cluster by providing a corresponding pipeline, an option to
> force the deletion is provided, in the case where more than one pipeline use
> the same cluster
> ** Getting all pipelines that use the provided master_url
> ** Receiving a description back for either the cluster corresponding to a
> specific pipeline, or the descriptions of all pipelines
--
This message was sent by Atlassian Jira
(v8.20.1#820001)