[GitHub] [beam] KevinGG commented on a diff in pull request #23271: Upgraded Flink on Dataproc support from Interacitve Beam

GitBox Fri, 16 Sep 2022 14:38:35 -0700


KevinGG commented on code in PR #23271:
URL: https://github.com/apache/beam/pull/23271#discussion_r973417483



##########
sdks/python/apache_beam/runners/interactive/dataproc/dataproc_cluster_manager.py:
##########
@@ -43,7 +45,33 @@ class UnimportedDataproc:
 # Name of the log file auto-generated by Dataproc. We use it to locate the
 # startup output of the Flink daemon to retrieve master url and dashboard
 # information.
-DATAPROC_STAGING_LOG_NAME = 'dataproc-startup-script_output'
+DATAPROC_STAGING_LOG_NAME = 'dataproc-initialization-script-0_output'
+
+# Home dir of os user yarn.
+YARN_HOME = '/var/lib/hadoop-yarn'
+
+# Configures the os user yarn to use gcloud as the docker credHelper.
+# Also sets some taskmanager configurations for better parallelism.
+# Finally starts the yarn application: flink cluster in session mode.
+INIT_ACTION = """#!/bin/bash
+sudo -u yarn gcloud auth configure-docker --quiet
+
+readonly FLINK_INSTALL_DIR='/usr/lib/flink'
+readonly MASTER_HOSTNAME="$(/usr/share/google/get_metadata_value 
attributes/dataproc-master)"
+
+cat <<EOF >>${FLINK_INSTALL_DIR}/conf/flink-conf.yaml

Review Comment:
   I don't need to set those values and prefer to let Dataproc select the 
appropriate values.
   
   The values I set here are basically:
   - taskmanager.memory.network.* (somehow Dataproc fixed all values to 64mb 
and didn't honor the yarn defaults).
   - taskmanager.network.numberOfBuffers: this should honor >= `#slots-per-TM^2 
* #TMs * 4`. I temporarily increased the default value to support hundreds of 
parallelism when there are hundreds of CPUs in the cluster. If needed, we shall 
expose it for customization by end users in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] KevinGG commented on a diff in pull request #23271: Upgraded Flink on Dataproc support from Interacitve Beam

Reply via email to