[spark-docker] branch master updated: [SPARK-40516] Add Apache Spark 3.3.0 Dockerfile

yikun Mon, 10 Oct 2022 19:46:20 -0700

This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git



The following commit(s) were added to refs/heads/master by this push:
     new e61aba1  [SPARK-40516] Add Apache Spark 3.3.0 Dockerfile
e61aba1 is described below

commit e61aba1ed4ca8e747f38cae5f6bd72a3a50f57cd
Author: Yikun Jiang <[email protected]>
AuthorDate: Tue Oct 11 10:45:57 2022 +0800

    [SPARK-40516] Add Apache Spark 3.3.0 Dockerfile
    
    ### What changes were proposed in this pull request?
    This patch adds Apache Spark 3.3.0 Dockerfile:
    - 3.3.0-scala2.12-java11-python3-ubuntu: pyspark + scala
    - 3.3.0-scala2.12-java11-ubuntu: scala
    - 3.3.0-scala2.12-java11-r-ubuntu: sparkr + scala
    - 3.3.0-scala2.12-java11-python3-r-ubuntu: All in one image
    
    ### Why are the changes needed?
    This is needed by Docker Official Image
    
    See also in: 
https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    **The action won't be triggered until the workflow is merged to the default 
branch**, so I can only test it in my local repo:
    
    - local test: https://github.com/Yikun/spark-docker/pull/1
    
![image](https://user-images.githubusercontent.com/1736354/194975185-d5843c84-bbba-48d0-bbf0-363532c6712d.png)
    - Dockerfile E2E K8S Local test: 
https://github.com/Yikun/spark-docker-bak/pull/7
    
![image](https://user-images.githubusercontent.com/1736354/194975267-6dca0de5-c715-4e0f-b735-22752b7912de.png)
    
    Closes #2 from Yikun/SPARK-40516.
    
    Authored-by: Yikun Jiang <[email protected]>
    Signed-off-by: Yikun Jiang <[email protected]>
---
 .github/workflows/build_3.3.0.yaml                 |  38 ++++++++
 .github/workflows/main.yml                         | 105 ++++++++++++++++++++
 3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile |  84 ++++++++++++++++
 .../entrypoint.sh                                  | 107 +++++++++++++++++++++
 3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile   |  81 ++++++++++++++++
 .../scala2.12-java11-python3-ubuntu/entrypoint.sh  | 107 +++++++++++++++++++++
 3.3.0/scala2.12-java11-r-ubuntu/Dockerfile         |  79 +++++++++++++++
 3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh      | 107 +++++++++++++++++++++
 3.3.0/scala2.12-java11-ubuntu/Dockerfile           |  76 +++++++++++++++
 3.3.0/scala2.12-java11-ubuntu/entrypoint.sh        | 107 +++++++++++++++++++++
 10 files changed, 891 insertions(+)

diff --git a/.github/workflows/build_3.3.0.yaml 
b/.github/workflows/build_3.3.0.yaml
new file mode 100644
index 0000000..63b1ab3
--- /dev/null
+++ b/.github/workflows/build_3.3.0.yaml
@@ -0,0 +1,38 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build and Test (3.3.0)"
+
+on:
+  pull_request:
+    branches:
+      - 'master'
+    paths:
+      - '3.3.0/'
+      - '.github/workflows/main.yml'
+
+jobs:
+  run-build:
+    name: Run
+    secrets: inherit
+    uses: ./.github/workflows/main.yml
+    with:
+      spark: 3.3.0
+      scala: 2.12
+      java: 11
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
new file mode 100644
index 0000000..90bd706
--- /dev/null
+++ b/.github/workflows/main.yml
@@ -0,0 +1,105 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: Main (Build/Test/Publish)
+
+on:
+  workflow_call:
+    inputs:
+      spark:
+        description: The Spark version of Spark image.
+        required: true
+        type: string
+        default: 3.3.0
+      scala:
+        description: The Scala version of Spark image.
+        required: true
+        type: string
+        default: 2.12
+      java:
+        description: The Java version of Spark image.
+        required: true
+        type: string
+        default: 11
+
+jobs:
+  main:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        spark_version:
+          - ${{ inputs.spark }}
+        scala_version:
+          - ${{ inputs.scala }}
+        java_version:
+          - ${{ inputs.java }}
+        image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
+    steps:
+      - name: Checkout Spark repository
+        uses: actions/checkout@v2
+
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@v1
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v1
+
+      - name: Login to GHCR
+        uses: docker/login-action@v2
+        with:
+          registry: ghcr.io
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Generate tags
+        run: |
+          TAG=scala${{ matrix.scala_version }}-java${{ matrix.java_version 
}}-${{ matrix.image_suffix }}
+
+          REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' 
'[:lower:]')
+          TEST_REPO=ghcr.io/$REPO_OWNER/spark-docker
+          IMAGE_NAME=spark
+          IMAGE_PATH=${{ matrix.spark_version }}/$TAG
+          UNIQUE_IMAGE_TAG=${{ matrix.spark_version }}-$TAG
+
+          # Unique image tag in each version: scala2.12-java11-python3-ubuntu
+          echo "UNIQUE_IMAGE_TAG=${UNIQUE_IMAGE_TAG}" >> $GITHUB_ENV
+          # Test repo: ghcr.io/apache/spark-docker
+          echo "TEST_REPO=${TEST_REPO}" >> $GITHUB_ENV
+          # Image name: spark
+          echo "IMAGE_NAME=${IMAGE_NAME}" >> $GITHUB_ENV
+          # Image dockerfile path: 3.3.0/scala2.12-java11-python3-ubuntu
+          echo "IMAGE_PATH=${IMAGE_PATH}" >> $GITHUB_ENV
+
+      - name: Print Image tags
+        run: |
+          echo "UNIQUE_IMAGE_TAG: "${UNIQUE_IMAGE_TAG}
+          echo "TEST_REPO: "${TEST_REPO}
+          echo "IMAGE_NAME: "${IMAGE_NAME}
+          echo "IMAGE_PATH: "${IMAGE_PATH}
+
+      - name: Build and push test image
+        uses: docker/build-push-action@v2
+        with:
+          context: ${{ env.IMAGE_PATH }}
+          push: true
+          tags: ${{ env.TEST_REPO }}:${{ env.UNIQUE_IMAGE_TAG }}
+          platforms: linux/amd64,linux/arm64
+
+      - name: Image digest
+        run: echo ${{ steps.docker_build.outputs.digest }}
diff --git a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
new file mode 100644
index 0000000..c95dd39
--- /dev/null
+++ b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -0,0 +1,84 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+FROM eclipse-temurin:11-jre-focal
+
+ARG spark_uid=185
+
+RUN set -ex && \
+    apt-get update && \
+    ln -s /lib /lib64 && \
+    apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools && \
+    apt install -y python3 python3-pip && \
+    pip3 install --upgrade pip setuptools && \
+    apt install -y r-base r-base-dev && \
+    mkdir -p /opt/spark && \
+    mkdir /opt/spark/python && \
+    mkdir -p /opt/spark/examples && \
+    mkdir -p /opt/spark/work-dir && \
+    touch /opt/spark/RELEASE && \
+    rm /bin/sh && \
+    ln -sv /bin/bash /bin/sh && \
+    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
+    chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
+    rm -rf /var/cache/apt/*
+
+# Install Apache Spark
+# https://downloads.apache.org/spark/KEYS
+ENV 
SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
 \
+    
SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc
 \
+    GPG_KEY=E298A3A825C0D65DFD57CBB651716619E084DAB9
+
+RUN set -ex; \
+    export SPARK_TMP="$(mktemp -d)"; \
+    cd $SPARK_TMP; \
+    wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
+    wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
+    export GNUPGHOME="$(mktemp -d)"; \
+    gpg --keyserver hkps://keyserver.pgp.com --recv-key "$GPG_KEY" || \
+    gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY" || \
+    gpg --batch --verify spark.tgz.asc spark.tgz; \
+    gpgconf --kill all; \
+    rm -rf "$GNUPGHOME" spark.tgz.asc; \
+    \
+    tar -xf spark.tgz --strip-components=1; \
+    mv jars /opt/spark/; \
+    mv bin /opt/spark/; \
+    mv sbin /opt/spark/; \
+    mv kubernetes/dockerfiles/spark/decom.sh /opt/; \
+    mv examples /opt/spark/; \
+    mv kubernetes/tests /opt/spark/; \
+    mv data /opt/spark/; \
+    mv python/pyspark /opt/spark/python/pyspark/; \
+    mv python/lib /opt/spark/python/lib/; \
+    mv R /opt/spark/; \
+    cd ..; \
+    rm -rf "$SPARK_TMP";
+
+COPY entrypoint.sh /opt/
+
+ENV SPARK_HOME /opt/spark
+ENV R_HOME /usr/lib/R
+
+WORKDIR /opt/spark/work-dir
+RUN chmod g+w /opt/spark/work-dir
+RUN chmod a+x /opt/decom.sh
+RUN chmod a+x /opt/entrypoint.sh
+
+ENTRYPOINT [ "/opt/entrypoint.sh" ]
+
+# Specify the User that the actual main process will run as
+USER ${spark_uid}
diff --git a/3.3.0/scala2.12-java11-python3-r-ubuntu/entrypoint.sh 
b/3.3.0/scala2.12-java11-python3-r-ubuntu/entrypoint.sh
new file mode 100644
index 0000000..cfd7a69
--- /dev/null
+++ b/3.3.0/scala2.12-java11-python3-r-ubuntu/entrypoint.sh
@@ -0,0 +1,107 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Check whether there is a passwd entry for the container UID
+myuid=$(id -u)
+mygid=$(id -g)
+# turn off -e for getent because it will return error code in anonymous uid 
case
+set +e
+uidentry=$(getent passwd $myuid)
+set -e
+
+# If there is no passwd entry for the container UID, attempt to create one
+if [ -z "$uidentry" ] ; then
+    if [ -w /etc/passwd ] ; then
+        echo "$myuid:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous 
uid}:$SPARK_HOME:/bin/false" >> /etc/passwd
+    else
+        echo "Container ENTRYPOINT failed to add passwd entry for anonymous 
UID"
+    fi
+fi
+
+if [ -z "$JAVA_HOME" ]; then
+  JAVA_HOME=$(java -XshowSettings:properties -version 2>&1 > /dev/null | grep 
'java.home' | awk '{print $3}')
+fi
+
+SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"
+env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt
+readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt
+
+if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
+  SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
+fi
+
+if ! [ -z ${PYSPARK_PYTHON+x} ]; then
+    export PYSPARK_PYTHON
+fi
+if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then
+    export PYSPARK_DRIVER_PYTHON
+fi
+
+# If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
+# It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
+if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
+  export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"
+fi
+
+if ! [ -z ${HADOOP_CONF_DIR+x} ]; then
+  SPARK_CLASSPATH="$HADOOP_CONF_DIR:$SPARK_CLASSPATH";
+fi
+
+if ! [ -z ${SPARK_CONF_DIR+x} ]; then
+  SPARK_CLASSPATH="$SPARK_CONF_DIR:$SPARK_CLASSPATH";
+elif ! [ -z ${SPARK_HOME+x} ]; then
+  SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";
+fi
+
+case "$1" in
+  driver)
+    shift 1
+    CMD=(
+      "$SPARK_HOME/bin/spark-submit"
+      --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
+      --deploy-mode client
+      "$@"
+    )
+    ;;
+  executor)
+    shift 1
+    CMD=(
+      ${JAVA_HOME}/bin/java
+      "${SPARK_EXECUTOR_JAVA_OPTS[@]}"
+      -Xms$SPARK_EXECUTOR_MEMORY
+      -Xmx$SPARK_EXECUTOR_MEMORY
+      -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"
+      org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBackend
+      --driver-url $SPARK_DRIVER_URL
+      --executor-id $SPARK_EXECUTOR_ID
+      --cores $SPARK_EXECUTOR_CORES
+      --app-id $SPARK_APPLICATION_ID
+      --hostname $SPARK_EXECUTOR_POD_IP
+      --resourceProfileId $SPARK_RESOURCE_PROFILE_ID
+      --podName $SPARK_EXECUTOR_POD_NAME
+    )
+    ;;
+
+  *)
+    # Non-spark-on-k8s command provided, proceeding in pass-through mode...
+    CMD=("$@")
+    ;;
+esac
+
+# Execute the container CMD under tini for better hygiene
+exec /usr/bin/tini -s -- "${CMD[@]}"
diff --git a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
new file mode 100644
index 0000000..e3d9829
--- /dev/null
+++ b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
@@ -0,0 +1,81 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+FROM eclipse-temurin:11-jre-focal
+
+ARG spark_uid=185
+
+RUN set -ex && \
+    apt-get update && \
+    ln -s /lib /lib64 && \
+    apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools && \
+    apt install -y python3 python3-pip && \
+    pip3 install --upgrade pip setuptools && \
+    mkdir -p /opt/spark && \
+    mkdir /opt/spark/python && \
+    mkdir -p /opt/spark/examples && \
+    mkdir -p /opt/spark/work-dir && \
+    touch /opt/spark/RELEASE && \
+    rm /bin/sh && \
+    ln -sv /bin/bash /bin/sh && \
+    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
+    chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
+    rm -rf /var/cache/apt/*
+
+# Install Apache Spark
+# https://downloads.apache.org/spark/KEYS
+ENV 
SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
 \
+    
SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc
 \
+    GPG_KEY=E298A3A825C0D65DFD57CBB651716619E084DAB9
+
+RUN set -ex; \
+    export SPARK_TMP="$(mktemp -d)"; \
+    cd $SPARK_TMP; \
+    wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
+    wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
+    export GNUPGHOME="$(mktemp -d)"; \
+    gpg --keyserver hkps://keyserver.pgp.com --recv-key "$GPG_KEY" || \
+    gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY" || \
+    gpg --batch --verify spark.tgz.asc spark.tgz; \
+    gpgconf --kill all; \
+    rm -rf "$GNUPGHOME" spark.tgz.asc; \
+    \
+    tar -xf spark.tgz --strip-components=1; \
+    mv jars /opt/spark/; \
+    mv bin /opt/spark/; \
+    mv sbin /opt/spark/; \
+    mv kubernetes/dockerfiles/spark/decom.sh /opt/; \
+    mv examples /opt/spark/; \
+    mv kubernetes/tests /opt/spark/; \
+    mv data /opt/spark/; \
+    mv python/pyspark /opt/spark/python/pyspark/; \
+    mv python/lib /opt/spark/python/lib/; \
+    cd ..; \
+    rm -rf "$SPARK_TMP";
+
+COPY entrypoint.sh /opt/
+
+ENV SPARK_HOME /opt/spark
+
+WORKDIR /opt/spark/work-dir
+RUN chmod g+w /opt/spark/work-dir
+RUN chmod a+x /opt/decom.sh
+RUN chmod a+x /opt/entrypoint.sh
+
+ENTRYPOINT [ "/opt/entrypoint.sh" ]
+
+# Specify the User that the actual main process will run as
+USER ${spark_uid}
diff --git a/3.3.0/scala2.12-java11-python3-ubuntu/entrypoint.sh 
b/3.3.0/scala2.12-java11-python3-ubuntu/entrypoint.sh
new file mode 100644
index 0000000..cfd7a69
--- /dev/null
+++ b/3.3.0/scala2.12-java11-python3-ubuntu/entrypoint.sh
@@ -0,0 +1,107 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Check whether there is a passwd entry for the container UID
+myuid=$(id -u)
+mygid=$(id -g)
+# turn off -e for getent because it will return error code in anonymous uid 
case
+set +e
+uidentry=$(getent passwd $myuid)
+set -e
+
+# If there is no passwd entry for the container UID, attempt to create one
+if [ -z "$uidentry" ] ; then
+    if [ -w /etc/passwd ] ; then
+        echo "$myuid:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous 
uid}:$SPARK_HOME:/bin/false" >> /etc/passwd
+    else
+        echo "Container ENTRYPOINT failed to add passwd entry for anonymous 
UID"
+    fi
+fi
+
+if [ -z "$JAVA_HOME" ]; then
+  JAVA_HOME=$(java -XshowSettings:properties -version 2>&1 > /dev/null | grep 
'java.home' | awk '{print $3}')
+fi
+
+SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"
+env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt
+readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt
+
+if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
+  SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
+fi
+
+if ! [ -z ${PYSPARK_PYTHON+x} ]; then
+    export PYSPARK_PYTHON
+fi
+if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then
+    export PYSPARK_DRIVER_PYTHON
+fi
+
+# If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
+# It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
+if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
+  export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"
+fi
+
+if ! [ -z ${HADOOP_CONF_DIR+x} ]; then
+  SPARK_CLASSPATH="$HADOOP_CONF_DIR:$SPARK_CLASSPATH";
+fi
+
+if ! [ -z ${SPARK_CONF_DIR+x} ]; then
+  SPARK_CLASSPATH="$SPARK_CONF_DIR:$SPARK_CLASSPATH";
+elif ! [ -z ${SPARK_HOME+x} ]; then
+  SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";
+fi
+
+case "$1" in
+  driver)
+    shift 1
+    CMD=(
+      "$SPARK_HOME/bin/spark-submit"
+      --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
+      --deploy-mode client
+      "$@"
+    )
+    ;;
+  executor)
+    shift 1
+    CMD=(
+      ${JAVA_HOME}/bin/java
+      "${SPARK_EXECUTOR_JAVA_OPTS[@]}"
+      -Xms$SPARK_EXECUTOR_MEMORY
+      -Xmx$SPARK_EXECUTOR_MEMORY
+      -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"
+      org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBackend
+      --driver-url $SPARK_DRIVER_URL
+      --executor-id $SPARK_EXECUTOR_ID
+      --cores $SPARK_EXECUTOR_CORES
+      --app-id $SPARK_APPLICATION_ID
+      --hostname $SPARK_EXECUTOR_POD_IP
+      --resourceProfileId $SPARK_RESOURCE_PROFILE_ID
+      --podName $SPARK_EXECUTOR_POD_NAME
+    )
+    ;;
+
+  *)
+    # Non-spark-on-k8s command provided, proceeding in pass-through mode...
+    CMD=("$@")
+    ;;
+esac
+
+# Execute the container CMD under tini for better hygiene
+exec /usr/bin/tini -s -- "${CMD[@]}"
diff --git a/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile
new file mode 100644
index 0000000..9745f54
--- /dev/null
+++ b/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile
@@ -0,0 +1,79 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+FROM eclipse-temurin:11-jre-focal
+
+ARG spark_uid=185
+
+RUN set -ex && \
+    apt-get update && \
+    ln -s /lib /lib64 && \
+    apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools && \
+    apt install -y r-base r-base-dev && \
+    mkdir -p /opt/spark && \
+    mkdir -p /opt/spark/examples && \
+    mkdir -p /opt/spark/work-dir && \
+    touch /opt/spark/RELEASE && \
+    rm /bin/sh && \
+    ln -sv /bin/bash /bin/sh && \
+    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
+    chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
+    rm -rf /var/cache/apt/*
+
+# Install Apache Spark
+# https://downloads.apache.org/spark/KEYS
+ENV 
SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
 \
+    
SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc
 \
+    GPG_KEY=E298A3A825C0D65DFD57CBB651716619E084DAB9
+
+RUN set -ex; \
+    export SPARK_TMP="$(mktemp -d)"; \
+    cd $SPARK_TMP; \
+    wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
+    wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
+    export GNUPGHOME="$(mktemp -d)"; \
+    gpg --keyserver hkps://keyserver.pgp.com --recv-key "$GPG_KEY" || \
+    gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY" || \
+    gpg --batch --verify spark.tgz.asc spark.tgz; \
+    gpgconf --kill all; \
+    rm -rf "$GNUPGHOME" spark.tgz.asc; \
+    \
+    tar -xf spark.tgz --strip-components=1; \
+    mv jars /opt/spark/; \
+    mv bin /opt/spark/; \
+    mv sbin /opt/spark/; \
+    mv kubernetes/dockerfiles/spark/decom.sh /opt/; \
+    mv examples /opt/spark/; \
+    mv kubernetes/tests /opt/spark/; \
+    mv data /opt/spark/; \
+    mv R /opt/spark/; \
+    cd ..; \
+    rm -rf "$SPARK_TMP";
+
+COPY entrypoint.sh /opt/
+
+ENV SPARK_HOME /opt/spark
+ENV R_HOME /usr/lib/R
+
+WORKDIR /opt/spark/work-dir
+RUN chmod g+w /opt/spark/work-dir
+RUN chmod a+x /opt/decom.sh
+RUN chmod a+x /opt/entrypoint.sh
+
+ENTRYPOINT [ "/opt/entrypoint.sh" ]
+
+# Specify the User that the actual main process will run as
+USER ${spark_uid}
diff --git a/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh 
b/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh
new file mode 100644
index 0000000..cfd7a69
--- /dev/null
+++ b/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh
@@ -0,0 +1,107 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Check whether there is a passwd entry for the container UID
+myuid=$(id -u)
+mygid=$(id -g)
+# turn off -e for getent because it will return error code in anonymous uid 
case
+set +e
+uidentry=$(getent passwd $myuid)
+set -e
+
+# If there is no passwd entry for the container UID, attempt to create one
+if [ -z "$uidentry" ] ; then
+    if [ -w /etc/passwd ] ; then
+        echo "$myuid:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous 
uid}:$SPARK_HOME:/bin/false" >> /etc/passwd
+    else
+        echo "Container ENTRYPOINT failed to add passwd entry for anonymous 
UID"
+    fi
+fi
+
+if [ -z "$JAVA_HOME" ]; then
+  JAVA_HOME=$(java -XshowSettings:properties -version 2>&1 > /dev/null | grep 
'java.home' | awk '{print $3}')
+fi
+
+SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"
+env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt
+readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt
+
+if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
+  SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
+fi
+
+if ! [ -z ${PYSPARK_PYTHON+x} ]; then
+    export PYSPARK_PYTHON
+fi
+if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then
+    export PYSPARK_DRIVER_PYTHON
+fi
+
+# If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
+# It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
+if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
+  export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"
+fi
+
+if ! [ -z ${HADOOP_CONF_DIR+x} ]; then
+  SPARK_CLASSPATH="$HADOOP_CONF_DIR:$SPARK_CLASSPATH";
+fi
+
+if ! [ -z ${SPARK_CONF_DIR+x} ]; then
+  SPARK_CLASSPATH="$SPARK_CONF_DIR:$SPARK_CLASSPATH";
+elif ! [ -z ${SPARK_HOME+x} ]; then
+  SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";
+fi
+
+case "$1" in
+  driver)
+    shift 1
+    CMD=(
+      "$SPARK_HOME/bin/spark-submit"
+      --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
+      --deploy-mode client
+      "$@"
+    )
+    ;;
+  executor)
+    shift 1
+    CMD=(
+      ${JAVA_HOME}/bin/java
+      "${SPARK_EXECUTOR_JAVA_OPTS[@]}"
+      -Xms$SPARK_EXECUTOR_MEMORY
+      -Xmx$SPARK_EXECUTOR_MEMORY
+      -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"
+      org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBackend
+      --driver-url $SPARK_DRIVER_URL
+      --executor-id $SPARK_EXECUTOR_ID
+      --cores $SPARK_EXECUTOR_CORES
+      --app-id $SPARK_APPLICATION_ID
+      --hostname $SPARK_EXECUTOR_POD_IP
+      --resourceProfileId $SPARK_RESOURCE_PROFILE_ID
+      --podName $SPARK_EXECUTOR_POD_NAME
+    )
+    ;;
+
+  *)
+    # Non-spark-on-k8s command provided, proceeding in pass-through mode...
+    CMD=("$@")
+    ;;
+esac
+
+# Execute the container CMD under tini for better hygiene
+exec /usr/bin/tini -s -- "${CMD[@]}"
diff --git a/3.3.0/scala2.12-java11-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-ubuntu/Dockerfile
new file mode 100644
index 0000000..ecbcc32
--- /dev/null
+++ b/3.3.0/scala2.12-java11-ubuntu/Dockerfile
@@ -0,0 +1,76 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+FROM eclipse-temurin:11-jre-focal
+
+ARG spark_uid=185
+
+RUN set -ex && \
+    apt-get update && \
+    ln -s /lib /lib64 && \
+    apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools && \
+    mkdir -p /opt/spark && \
+    mkdir -p /opt/spark/examples && \
+    mkdir -p /opt/spark/work-dir && \
+    touch /opt/spark/RELEASE && \
+    rm /bin/sh && \
+    ln -sv /bin/bash /bin/sh && \
+    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
+    chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
+    rm -rf /var/cache/apt/*
+
+# Install Apache Spark
+# https://downloads.apache.org/spark/KEYS
+ENV 
SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
 \
+    
SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc
 \
+    GPG_KEY=E298A3A825C0D65DFD57CBB651716619E084DAB9
+
+RUN set -ex; \
+    export SPARK_TMP="$(mktemp -d)"; \
+    cd $SPARK_TMP; \
+    wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
+    wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
+    export GNUPGHOME="$(mktemp -d)"; \
+    gpg --keyserver hkps://keyserver.pgp.com --recv-key "$GPG_KEY" || \
+    gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY" || \
+    gpg --batch --verify spark.tgz.asc spark.tgz; \
+    gpgconf --kill all; \
+    rm -rf "$GNUPGHOME" spark.tgz.asc; \
+    \
+    tar -xf spark.tgz --strip-components=1; \
+    mv jars /opt/spark/; \
+    mv bin /opt/spark/; \
+    mv sbin /opt/spark/; \
+    mv kubernetes/dockerfiles/spark/decom.sh /opt/; \
+    mv examples /opt/spark/; \
+    mv kubernetes/tests /opt/spark/; \
+    mv data /opt/spark/; \
+    cd ..; \
+    rm -rf "$SPARK_TMP";
+
+COPY entrypoint.sh /opt/
+
+ENV SPARK_HOME /opt/spark
+
+WORKDIR /opt/spark/work-dir
+RUN chmod g+w /opt/spark/work-dir
+RUN chmod a+x /opt/decom.sh
+RUN chmod a+x /opt/entrypoint.sh
+
+ENTRYPOINT [ "/opt/entrypoint.sh" ]
+
+# Specify the User that the actual main process will run as
+USER ${spark_uid}
diff --git a/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh 
b/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh
new file mode 100644
index 0000000..cfd7a69
--- /dev/null
+++ b/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh
@@ -0,0 +1,107 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Check whether there is a passwd entry for the container UID
+myuid=$(id -u)
+mygid=$(id -g)
+# turn off -e for getent because it will return error code in anonymous uid 
case
+set +e
+uidentry=$(getent passwd $myuid)
+set -e
+
+# If there is no passwd entry for the container UID, attempt to create one
+if [ -z "$uidentry" ] ; then
+    if [ -w /etc/passwd ] ; then
+        echo "$myuid:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous 
uid}:$SPARK_HOME:/bin/false" >> /etc/passwd
+    else
+        echo "Container ENTRYPOINT failed to add passwd entry for anonymous 
UID"
+    fi
+fi
+
+if [ -z "$JAVA_HOME" ]; then
+  JAVA_HOME=$(java -XshowSettings:properties -version 2>&1 > /dev/null | grep 
'java.home' | awk '{print $3}')
+fi
+
+SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"
+env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt
+readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt
+
+if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
+  SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
+fi
+
+if ! [ -z ${PYSPARK_PYTHON+x} ]; then
+    export PYSPARK_PYTHON
+fi
+if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then
+    export PYSPARK_DRIVER_PYTHON
+fi
+
+# If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
+# It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
+if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
+  export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"
+fi
+
+if ! [ -z ${HADOOP_CONF_DIR+x} ]; then
+  SPARK_CLASSPATH="$HADOOP_CONF_DIR:$SPARK_CLASSPATH";
+fi
+
+if ! [ -z ${SPARK_CONF_DIR+x} ]; then
+  SPARK_CLASSPATH="$SPARK_CONF_DIR:$SPARK_CLASSPATH";
+elif ! [ -z ${SPARK_HOME+x} ]; then
+  SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";
+fi
+
+case "$1" in
+  driver)
+    shift 1
+    CMD=(
+      "$SPARK_HOME/bin/spark-submit"
+      --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
+      --deploy-mode client
+      "$@"
+    )
+    ;;
+  executor)
+    shift 1
+    CMD=(
+      ${JAVA_HOME}/bin/java
+      "${SPARK_EXECUTOR_JAVA_OPTS[@]}"
+      -Xms$SPARK_EXECUTOR_MEMORY
+      -Xmx$SPARK_EXECUTOR_MEMORY
+      -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"
+      org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBackend
+      --driver-url $SPARK_DRIVER_URL
+      --executor-id $SPARK_EXECUTOR_ID
+      --cores $SPARK_EXECUTOR_CORES
+      --app-id $SPARK_APPLICATION_ID
+      --hostname $SPARK_EXECUTOR_POD_IP
+      --resourceProfileId $SPARK_RESOURCE_PROFILE_ID
+      --podName $SPARK_EXECUTOR_POD_NAME
+    )
+    ;;
+
+  *)
+    # Non-spark-on-k8s command provided, proceeding in pass-through mode...
+    CMD=("$@")
+    ;;
+esac
+
+# Execute the container CMD under tini for better hygiene
+exec /usr/bin/tini -s -- "${CMD[@]}"


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark-docker] branch master updated: [SPARK-40516] Add Apache Spark 3.3.0 Dockerfile

Reply via email to