This is an automated email from the ASF dual-hosted git repository.
viirya pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push:
new 64a21ea [SPARK-29574][K8S][2.4] Add SPARK_DIST_CLASSPATH to the
executor class path
64a21ea is described below
commit 64a21ea08046eb13b1b3dae487747ae5a98e1f9d
Author: Shahin Shakeri <[email protected]>
AuthorDate: Sat Oct 31 14:18:12 2020 -0700
[SPARK-29574][K8S][2.4] Add SPARK_DIST_CLASSPATH to the executor class path
### What changes were proposed in this pull request?
This is a backport of https://github.com/apache/spark/pull/26493 according
to the community request https://github.com/apache/spark/pull/30174 .
Include `$SPARK_DIST_CLASSPATH` in class path when launching
`CoarseGrainedExecutorBackend` on Kubernetes executors using the provided
`entrypoint.sh`
### Why are the changes needed?
For user provided Hadoop, `$SPARK_DIST_CLASSPATH` contains the required
jars.
### Does this PR introduce any user-facing change?
no
### How was this patch tested?
Kubernetes 1.14, Spark 2.4.4, Hadoop 3.2.1. Adding $SPARK_DIST_CLASSPATH to
`-cp ` param of entrypoint.sh enables launching the executors correctly.
Closes #30214 from dongjoon-hyun/SPARK-29574-2.4.
Lead-authored-by: Shahin Shakeri <[email protected]>
Co-authored-by: Đặng Minh Dũng <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
---
docs/hadoop-provided.md | 22 ++++++++++++++++++++++
.../src/main/dockerfiles/spark/entrypoint.sh | 12 +++++++++++-
2 files changed, 33 insertions(+), 1 deletion(-)
diff --git a/docs/hadoop-provided.md b/docs/hadoop-provided.md
index bbd26b3..07320b3 100644
--- a/docs/hadoop-provided.md
+++ b/docs/hadoop-provided.md
@@ -24,3 +24,25 @@ export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop
classpath)
export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)
{% endhighlight %}
+
+# Hadoop Free Build Setup for Spark on Kubernetes
+To run the Hadoop free build of Spark on Kubernetes, the executor image must
have the appropriate version of Hadoop binaries and the correct
`SPARK_DIST_CLASSPATH` value set. See the example below for the relevant
changes needed in the executor Dockerfile:
+
+{% highlight bash %}
+### Set environment variables in the executor dockerfile ###
+
+ENV SPARK_HOME="/opt/spark"
+ENV HADOOP_HOME="/opt/hadoop"
+ENV PATH="$SPARK_HOME/bin:$HADOOP_HOME/bin:$PATH"
+...
+
+#Copy your target hadoop binaries to the executor hadoop home
+
+COPY /opt/hadoop3 $HADOOP_HOME
+...
+
+#Copy and use the Spark provided entrypoint.sh. It sets your
SPARK_DIST_CLASSPATH using the hadoop binary in $HADOOP_HOME and starts the
executor. If you choose to customize the value of SPARK_DIST_CLASSPATH here,
the value will be retained in entrypoint.sh
+
+ENTRYPOINT [ "/opt/entrypoint.sh" ]
+...
+{% endhighlight %}
diff --git
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
index ba5d17b..e2e09d3 100755
---
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
+++
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
@@ -83,6 +83,16 @@ elif [ "$PYSPARK_MAJOR_PYTHON_VERSION" == "3" ]; then
export PYSPARK_DRIVER_PYTHON="python3"
fi
+# If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so
Hadoop jars are available to the executor.
+# It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding
customizations of this value from elsewhere e.g. Docker/K8s.
+if [ -n "${HADOOP_HOME}" ] && [ -z "${SPARK_DIST_CLASSPATH}" ]; then
+ export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"
+fi
+
+if ! [ -z ${HADOOP_CONF_DIR+x} ]; then
+ SPARK_CLASSPATH="$HADOOP_CONF_DIR:$SPARK_CLASSPATH";
+fi
+
case "$SPARK_K8S_CMD" in
driver)
CMD=(
@@ -114,7 +124,7 @@ case "$SPARK_K8S_CMD" in
"${SPARK_EXECUTOR_JAVA_OPTS[@]}"
-Xms$SPARK_EXECUTOR_MEMORY
-Xmx$SPARK_EXECUTOR_MEMORY
- -cp "$SPARK_CLASSPATH"
+ -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"
org.apache.spark.executor.CoarseGrainedExecutorBackend
--driver-url $SPARK_DRIVER_URL
--executor-id $SPARK_EXECUTOR_ID
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]