functicons commented on code in PR #23271:
URL: https://github.com/apache/beam/pull/23271#discussion_r973398074


##########
sdks/python/apache_beam/runners/interactive/dataproc/dataproc_cluster_manager.py:
##########
@@ -64,6 +92,30 @@ def __init__(self, cluster_metadata: ClusterMetadata) -> 
None:
         })
     self._fs = gcsfilesystem.GCSFileSystem(PipelineOptions())
     self._staging_directory = None
+    cache_dir = ie.current_env().options.cache_root
+    try:
+      assert cache_dir.startswith('gs://')

Review Comment:
   assertion is usually used to catch code bug, in this case, it is invalid 
user input, not a bug. I think it is better to use a if check then throw an 
exception.



##########
sdks/python/apache_beam/runners/interactive/dataproc/dataproc_cluster_manager.py:
##########
@@ -43,7 +45,33 @@ class UnimportedDataproc:
 # Name of the log file auto-generated by Dataproc. We use it to locate the
 # startup output of the Flink daemon to retrieve master url and dashboard
 # information.
-DATAPROC_STAGING_LOG_NAME = 'dataproc-startup-script_output'
+DATAPROC_STAGING_LOG_NAME = 'dataproc-initialization-script-0_output'
+
+# Home dir of os user yarn.
+YARN_HOME = '/var/lib/hadoop-yarn'
+
+# Configures the os user yarn to use gcloud as the docker credHelper.
+# Also sets some taskmanager configurations for better parallelism.
+# Finally starts the yarn application: flink cluster in session mode.
+INIT_ACTION = """#!/bin/bash
+sudo -u yarn gcloud auth configure-docker --quiet
+
+readonly FLINK_INSTALL_DIR='/usr/lib/flink'
+readonly MASTER_HOSTNAME="$(/usr/share/google/get_metadata_value 
attributes/dataproc-master)"
+
+cat <<EOF >>${FLINK_INSTALL_DIR}/conf/flink-conf.yaml

Review Comment:
   Do you need to configure jobmanager and taskmanager memory size? The default 
value set by Dataproc is problematic in some cases AFAIK.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to