[spark] branch branch-2.4 updated: [SPARK-29574][K8S][2.4] Add SPARK_DIST_CLASSPATH to the executor class path

viirya Sat, 31 Oct 2020 14:31:38 -0700

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new 64a21ea  [SPARK-29574][K8S][2.4] Add SPARK_DIST_CLASSPATH to the 
executor class path
64a21ea is described below

commit 64a21ea08046eb13b1b3dae487747ae5a98e1f9d
Author: Shahin Shakeri <[email protected]>
AuthorDate: Sat Oct 31 14:18:12 2020 -0700

    [SPARK-29574][K8S][2.4] Add SPARK_DIST_CLASSPATH to the executor class path
    
    ### What changes were proposed in this pull request?
    
    This is a backport of https://github.com/apache/spark/pull/26493 according 
to the community request https://github.com/apache/spark/pull/30174 .
    
    Include `$SPARK_DIST_CLASSPATH` in class path when launching 
`CoarseGrainedExecutorBackend` on Kubernetes executors using the provided 
`entrypoint.sh`
    
    ### Why are the changes needed?
    
    For user provided Hadoop, `$SPARK_DIST_CLASSPATH` contains the required 
jars.
    
    ### Does this PR introduce any user-facing change?
    no
    
    ### How was this patch tested?
    Kubernetes 1.14, Spark 2.4.4, Hadoop 3.2.1. Adding $SPARK_DIST_CLASSPATH to 
 `-cp ` param of entrypoint.sh enables launching the executors correctly.
    
    Closes #30214 from dongjoon-hyun/SPARK-29574-2.4.
    
    Lead-authored-by: Shahin Shakeri <[email protected]>
    Co-authored-by: Đặng Minh Dũng <[email protected]>
    Signed-off-by: Liang-Chi Hsieh <[email protected]>
---
 docs/hadoop-provided.md                            | 22 ++++++++++++++++++++++
 .../src/main/dockerfiles/spark/entrypoint.sh       | 12 +++++++++++-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/docs/hadoop-provided.md b/docs/hadoop-provided.md
index bbd26b3..07320b3 100644
--- a/docs/hadoop-provided.md
+++ b/docs/hadoop-provided.md
@@ -24,3 +24,25 @@ export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop 
classpath)
 export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)
 
 {% endhighlight %}
+
+# Hadoop Free Build Setup for Spark on Kubernetes  
+To run the Hadoop free build of Spark on Kubernetes, the executor image must 
have the appropriate version of Hadoop binaries and the correct 
`SPARK_DIST_CLASSPATH` value set. See the example below for the relevant 
changes needed in the executor Dockerfile:
+
+{% highlight bash %}
+### Set environment variables in the executor dockerfile ###
+
+ENV SPARK_HOME="/opt/spark"  
+ENV HADOOP_HOME="/opt/hadoop"  
+ENV PATH="$SPARK_HOME/bin:$HADOOP_HOME/bin:$PATH"  
+...  
+
+#Copy your target hadoop binaries to the executor hadoop home   
+
+COPY /opt/hadoop3  $HADOOP_HOME  
+...
+
+#Copy and use the Spark provided entrypoint.sh. It sets your 
SPARK_DIST_CLASSPATH using the hadoop binary in $HADOOP_HOME and starts the 
executor. If you choose to customize the value of SPARK_DIST_CLASSPATH here, 
the value will be retained in entrypoint.sh
+
+ENTRYPOINT [ "/opt/entrypoint.sh" ]
+...  
+{% endhighlight %}
diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
index ba5d17b..e2e09d3 100755
--- 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
+++ 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
@@ -83,6 +83,16 @@ elif [ "$PYSPARK_MAJOR_PYTHON_VERSION" == "3" ]; then
     export PYSPARK_DRIVER_PYTHON="python3"
 fi
 
+# If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
+# It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
+if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
+  export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"
+fi
+
+if ! [ -z ${HADOOP_CONF_DIR+x} ]; then
+  SPARK_CLASSPATH="$HADOOP_CONF_DIR:$SPARK_CLASSPATH";
+fi
+
 case "$SPARK_K8S_CMD" in
   driver)
     CMD=(
@@ -114,7 +124,7 @@ case "$SPARK_K8S_CMD" in
       "${SPARK_EXECUTOR_JAVA_OPTS[@]}"
       -Xms$SPARK_EXECUTOR_MEMORY
       -Xmx$SPARK_EXECUTOR_MEMORY
-      -cp "$SPARK_CLASSPATH"
+      -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"
       org.apache.spark.executor.CoarseGrainedExecutorBackend
       --driver-url $SPARK_DRIVER_URL
       --executor-id $SPARK_EXECUTOR_ID


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-2.4 updated: [SPARK-29574][K8S][2.4] Add SPARK_DIST_CLASSPATH to the executor class path

Reply via email to