This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
     new 854a9d44b831 [SPARK-55115][INFRA][3.5] Use composable Dockerfile for 
release builds
854a9d44b831 is described below

commit 854a9d44b831e4782a58e0d53f03c26334697435
Author: Wenchen Fan <[email protected]>
AuthorDate: Fri Jan 23 17:42:14 2026 +0800

    [SPARK-55115][INFRA][3.5] Use composable Dockerfile for release builds
    
    ### What changes were proposed in this pull request?
    
    This PR refactors the release Docker image build process to use a 
composable Dockerfile approach:
    
    1. **`Dockerfile.base`**: A shared base image containing common tools 
(Ubuntu 22.04, R packages, Ruby/bundler, TeX, Node.js)
    2. **`Dockerfile`**: Branch-specific image that extends the base with 
Java/Python versions and packages for this branch
    3. **`do-release-docker.sh`**: Updated to build the base image first, then 
the branch-specific image
    
    ### Why are the changes needed?
    
    Currently, each branch maintains its own full Dockerfile which leads to:
    - Duplicated common configuration across branches
    - Difficulty keeping base tools (R packages, Ruby, etc.) in sync
    - Expired GPG keys or outdated base images affecting all branches
    
    With the composable approach:
    - Common tools are defined once in `Dockerfile.base`
    - Each branch only specifies its unique Java/Python requirements
    - Updates to base tools can be applied consistently
    
    ### Version changes
    
    | Component | Before | After |
    |-----------|--------|-------|
    | Ubuntu | 20.04 | 22.04 (jammy-20250819) |
    | bundler | 2.3.8 | 2.4.22 |
    | R CRAN repo | focal-cran40 | jammy-cran40 |
    | docutils | <0.17 | ==0.16 |
    | FULL_REFRESH_DATE | (none) | 20250819 |
    
    **Note**: The Ubuntu upgrade from 20.04 to 22.04 is necessary to use a 
shared base image across all branches. Ubuntu 20.04 reached end of standard 
support in April 2025.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. This only affects the release infrastructure.
    
    ### How was this patch tested?
    
    Docker image built and verified successfully on remote machine.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Yes
    
    Closes #53908 from cloud-fan/release-infra-3.5.
    
    Authored-by: Wenchen Fan <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
---
 dev/create-release/do-release-docker.sh     |   5 ++
 dev/create-release/spark-rm/Dockerfile      | 119 +++++++++-------------------
 dev/create-release/spark-rm/Dockerfile.base | 110 +++++++++++++++++++++++++
 3 files changed, 154 insertions(+), 80 deletions(-)

diff --git a/dev/create-release/do-release-docker.sh 
b/dev/create-release/do-release-docker.sh
index 4e8fffd08062..c9f651e0ba57 100755
--- a/dev/create-release/do-release-docker.sh
+++ b/dev/create-release/do-release-docker.sh
@@ -120,6 +120,11 @@ GPG_KEY_FILE="$WORKDIR/gpg.key"
 fcreate_secure "$GPG_KEY_FILE"
 $GPG --export-secret-key --armor --pinentry-mode loopback --passphrase 
"$GPG_PASSPHRASE" "$GPG_KEY" > "$GPG_KEY_FILE"
 
+# Build base image first (contains common tools shared across all branches)
+run_silent "Building spark-rm-base image..." "docker-build-base.log" \
+  docker build -t "spark-rm-base:latest" -f "$SELF/spark-rm/Dockerfile.base" 
"$SELF/spark-rm"
+
+# Build branch-specific image (extends base with Java/Python versions for this 
branch)
 run_silent "Building spark-rm image with tag $IMGTAG..." "docker-build.log" \
   docker build -t "spark-rm:$IMGTAG" --build-arg UID=$UID "$SELF/spark-rm"
 
diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index 7fb9c95bb0a3..8132e8ac6b3e 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -15,86 +15,45 @@
 # limitations under the License.
 #
 
-# Image for building Spark releases. Based on Ubuntu 20.04.
-#
-# Includes:
-# * Java 8
-# * Ivy
-# * Python (3.8.5)
-# * R-base/R-base-dev (4.0.3)
-# * Ruby (2.7.0)
-#
-# You can test it as below:
-#   cd dev/create-release/spark-rm
-#   docker build -t spark-rm --build-arg UID=$UID .
-
-FROM ubuntu:20.04
-
-# For apt to be noninteractive
-ENV DEBIAN_FRONTEND noninteractive
-ENV DEBCONF_NONINTERACTIVE_SEEN true
-
-# These arguments are just for reuse and not really meant to be customized.
-ARG APT_INSTALL="apt-get install --no-install-recommends -y"
-
-# TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes.
-#   See also https://github.com/sphinx-doc/sphinx/issues/7551.
-#   We should use the latest Sphinx version once this is fixed.
-# TODO(SPARK-35375): Jinja2 3.0.0+ causes error when building with Sphinx.
-#   See also https://issues.apache.org/jira/browse/SPARK-35375.
-ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.20.3 
pydata_sphinx_theme==0.8.0 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==2.11.3 twine==3.4.1 sphinx-plotly-directive==0.1.3 
sphinx-copybutton==0.5.2 pandas==2.0.3 pyarrow==4.0.0 plotly==5.4.0 
markupsafe==2.0.1 docutils<0.17 grpcio==1.56.0 protobuf==4.21.6 
grpcio-status==1.56.0 googleapis-common-protos==1.56.4"
-ARG GEM_PKGS="bundler:2.3.8"
-
-# Install extra needed repos and refresh.
-# - CRAN repo
-# - Ruby repo (for doc generation)
-#
-# This is all in a single "RUN" command so that if anything changes, "apt 
update" is run to fetch
-# the most current package versions (instead of potentially using old versions 
cached by docker).
-RUN apt-get clean && apt-get update && $APT_INSTALL gnupg ca-certificates && \
-  echo 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' >> 
/etc/apt/sources.list && \
-  gpg --keyserver hkps://keyserver.ubuntu.com --recv-key 
E298A3A825C0D65DFD57CBB651716619E084DAB9 && \
-  gpg -a --export E084DAB9 | apt-key add - && \
-  apt-get clean && \
-  rm -rf /var/lib/apt/lists/* && \
-  apt-get clean && \
-  apt-get update && \
-  $APT_INSTALL software-properties-common && \
-  apt-get update && \
-  $APT_INSTALL msmtp && \
-  # Install openjdk 8.
-  $APT_INSTALL openjdk-8-jdk && \
-  update-alternatives --set java $(ls 
/usr/lib/jvm/java-8-openjdk-*/jre/bin/java) && \
-  # Install build / source control tools
-  $APT_INSTALL curl wget git maven ivy subversion make gcc lsof libffi-dev \
-    pandoc pandoc-citeproc libssl-dev libcurl4-openssl-dev libxml2-dev && \
-  curl -sL https://deb.nodesource.com/setup_12.x | bash && \
-  $APT_INSTALL nodejs && \
-  # Install needed python packages. Use pip for installing packages (for 
consistency).
-  $APT_INSTALL python-is-python3 python3-pip python3-setuptools && \
-  # qpdf is required for CRAN checks to pass.
-  $APT_INSTALL qpdf jq && \
-  pip3 install $PIP_PKGS && \
-  # Install R packages and dependencies used when building.
-  # R depends on pandoc*, libssl (which are installed above).
-  # Note that PySpark doc generation also needs pandoc due to nbsphinx
-  $APT_INSTALL r-base r-base-dev && \
-  $APT_INSTALL libcurl4-openssl-dev libgit2-dev libssl-dev libxml2-dev && \
-  $APT_INSTALL texlive-latex-base texlive texlive-fonts-extra texinfo qpdf 
texlive-latex-extra && \
-  $APT_INSTALL libfontconfig1-dev libharfbuzz-dev libfribidi-dev 
libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libwebp-dev && \
-  Rscript -e "install.packages(c('curl', 'xml2', 'httr', 'devtools', 
'testthat', 'knitr', 'rmarkdown', 'markdown', 'e1071', 'survival'), 
repos='https://cloud.r-project.org/')" && \
-  # See more in SPARK-39959, roxygen2 < 7.2.1
-  Rscript -e "devtools::install_version('roxygen2', version='7.2.0', 
repos='https://cloud.r-project.org')" && \
-  Rscript -e "devtools::install_version('lintr', version='2.0.1', 
repos='https://cloud.r-project.org')" && \
-  Rscript -e "devtools::install_version('preferably', version='0.4', 
repos='https://cloud.r-project.org')" && \
-  # See more in SPARK-54371, pkgdown should be installed at the end to avoid 
version upgrade
-  Rscript -e "devtools::install_version('pkgdown', version='2.0.1', 
repos='https://cloud.r-project.org')" && \
-  # Install tools needed to build the documentation.
-  $APT_INSTALL ruby2.7 ruby2.7-dev && \
-  gem install --no-document $GEM_PKGS
-
-WORKDIR /opt/spark-rm/output
-
+# Spark 3.5 release image
+# Extends the base image with:
+# - Java 8
+# - Python 3.8 with required packages
+
+FROM spark-rm-base:latest
+
+# Install Java 8 for Spark 3.x
+RUN apt-get update && apt-get install -y \
+    openjdk-8-jdk-headless \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install Python 3.8 from deadsnakes PPA
+RUN add-apt-repository ppa:deadsnakes/ppa && \
+    apt-get update && apt-get install -y \
+    python3.8 \
+    python3.8-dev \
+    python3.8-distutils \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install pip for Python 3.8 (using version-specific URL)
+RUN curl -sS https://bootstrap.pypa.io/pip/3.8/get-pip.py | python3.8
+
+# Python packages for Spark 3.5
+# Based on the original branch-3.5 Dockerfile
+ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.20.3 
pydata_sphinx_theme==0.8.0 \
+    ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 jinja2==2.11.3 
twine==3.4.1 \
+    sphinx-plotly-directive==0.1.3 sphinx-copybutton==0.5.2 pandas==2.0.3 
pyarrow==4.0.0 \
+    plotly==5.4.0 markupsafe==2.0.1 docutils==0.16 grpcio==1.56.0 
protobuf==4.21.6 \
+    grpcio-status==1.56.0 googleapis-common-protos==1.56.4"
+
+# Install Python 3.8 packages
+RUN python3.8 -m pip install --ignore-installed $PIP_PKGS
+
+# Set Python 3.8 as the default
+RUN ln -sf "$(which python3.8)" "/usr/local/bin/python" && \
+    ln -sf "$(which python3.8)" "/usr/local/bin/python3"
+
+# Create user for release manager
 ARG UID
 RUN useradd -m -s /bin/bash -p spark-rm -u $UID spark-rm
 USER spark-rm:spark-rm
diff --git a/dev/create-release/spark-rm/Dockerfile.base 
b/dev/create-release/spark-rm/Dockerfile.base
new file mode 100644
index 000000000000..56e85256d52d
--- /dev/null
+++ b/dev/create-release/spark-rm/Dockerfile.base
@@ -0,0 +1,110 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Base image for building Spark releases. Based on Ubuntu 22.04.
+# This image contains common tools shared across all Spark versions:
+# - Build tools (gcc, make, etc.)
+# - R with pinned package versions
+# - Ruby with bundler
+# - TeX for documentation
+# - Node.js for documentation
+#
+# Branch-specific Dockerfiles should use "FROM spark-rm-base:latest" and add:
+# - Java version (8 or 17)
+# - Python version and pip packages
+
+FROM ubuntu:jammy-20250819
+LABEL org.opencontainers.image.authors="Apache Spark project 
<[email protected]>"
+LABEL org.opencontainers.image.licenses="Apache-2.0"
+LABEL org.opencontainers.image.ref.name="Apache Spark Release Manager Base 
Image"
+LABEL org.opencontainers.image.version=""
+
+ENV FULL_REFRESH_DATE=20250819
+
+ENV DEBIAN_FRONTEND=noninteractive
+ENV DEBCONF_NONINTERACTIVE_SEEN=true
+
+# Install common system packages and build tools
+# Note: Java and Python are installed in branch-specific Dockerfiles
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    ca-certificates \
+    curl \
+    gfortran \
+    git \
+    subversion \
+    gnupg \
+    libcurl4-openssl-dev \
+    libfontconfig1-dev \
+    libfreetype6-dev \
+    libfribidi-dev \
+    libgit2-dev \
+    libharfbuzz-dev \
+    libjpeg-dev \
+    liblapack-dev \
+    libopenblas-dev \
+    libpng-dev \
+    libssl-dev \
+    libtiff5-dev \
+    libwebp-dev \
+    libxml2-dev \
+    msmtp \
+    nodejs \
+    npm \
+    pandoc \
+    pkg-config \
+    texlive-latex-base \
+    texlive \
+    texlive-fonts-extra \
+    texinfo \
+    texlive-latex-extra \
+    qpdf \
+    jq \
+    r-base \
+    ruby \
+    ruby-dev \
+    software-properties-common \
+    wget \
+    zlib1g-dev \
+    && rm -rf /var/lib/apt/lists/*
+
+# Set up R CRAN repository for latest R packages
+RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/' >> 
/etc/apt/sources.list && \
+    gpg --keyserver hkps://keyserver.ubuntu.com --recv-key 
E298A3A825C0D65DFD57CBB651716619E084DAB9 && \
+    gpg -a --export E084DAB9 | apt-key add - && \
+    add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu 
jammy-cran40/'
+
+# Install R packages (same versions across all branches)
+# See more in SPARK-39959, roxygen2 < 7.2.1
+RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', \
+    'rmarkdown', 'testthat', 'e1071', 'survival', 'arrow', \
+    'ggplot2', 'mvtnorm', 'statmod', 'xml2'), 
repos='https://cloud.r-project.org/')" && \
+    Rscript -e "devtools::install_version('roxygen2', version='7.2.0', 
repos='https://cloud.r-project.org')" && \
+    Rscript -e "devtools::install_version('lintr', version='2.0.1', 
repos='https://cloud.r-project.org')" && \
+    Rscript -e "devtools::install_version('preferably', version='0.4', 
repos='https://cloud.r-project.org')" && \
+    Rscript -e "devtools::install_version('pkgdown', version='2.0.1', 
repos='https://cloud.r-project.org')"
+
+# See more in SPARK-39735
+ENV 
R_LIBS_SITE="/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
+
+# Install Ruby bundler (same version across all branches)
+RUN gem install --no-document "bundler:2.4.22"
+
+# Create workspace directory
+WORKDIR /opt/spark-rm/output
+
+# Note: Java, Python, and user creation are done in branch-specific Dockerfiles


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to