This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch branch-4.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.0 by this push:
new c90c62751718 [SPARK-55115][INFRA][4.0] Use composable Dockerfile for
release builds
c90c62751718 is described below
commit c90c627517182c68c5c2737edeaff9b1a54f8bce
Author: Wenchen Fan <[email protected]>
AuthorDate: Fri Jan 23 17:40:25 2026 +0800
[SPARK-55115][INFRA][4.0] Use composable Dockerfile for release builds
### What changes were proposed in this pull request?
This PR refactors the release Docker image build process to use a
composable Dockerfile approach:
1. **`Dockerfile.base`**: A shared base image containing common tools
(Ubuntu 22.04, R packages, Ruby/bundler, TeX, Node.js)
2. **`Dockerfile`**: Branch-specific image that extends the base with
Java/Python versions and packages for this branch
3. **`do-release-docker.sh`**: Updated to build the base image first, then
the branch-specific image
### Why are the changes needed?
Currently, each branch maintains its own full Dockerfile which leads to:
- Duplicated common configuration across branches
- Difficulty keeping base tools (R packages, Ruby, etc.) in sync
- Expired GPG keys or outdated base images affecting all branches
With the composable approach:
- Common tools are defined once in `Dockerfile.base`
- Each branch only specifies its unique Java/Python requirements
- Updates to base tools can be applied consistently
### Version changes
| Component | Before | After |
|-----------|--------|-------|
| Ubuntu image | jammy-20240911.1 | jammy-20250819 |
| FULL_REFRESH_DATE | 20241119 | 20250819 |
### Does this PR introduce _any_ user-facing change?
No. This only affects the release infrastructure.
### How was this patch tested?
Docker image built and verified successfully on remote machine.
### Was this patch authored or co-authored using generative AI tooling?
Yes
Closes #53907 from cloud-fan/release-infra-4.0.
Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
---
dev/create-release/do-release-docker.sh | 5 +
dev/create-release/spark-rm/Dockerfile | 158 ++++++++++------------------
dev/create-release/spark-rm/Dockerfile.base | 110 +++++++++++++++++++
3 files changed, 169 insertions(+), 104 deletions(-)
diff --git a/dev/create-release/do-release-docker.sh
b/dev/create-release/do-release-docker.sh
index 3a395e3c266b..e231d7a48eec 100755
--- a/dev/create-release/do-release-docker.sh
+++ b/dev/create-release/do-release-docker.sh
@@ -120,6 +120,11 @@ GPG_KEY_FILE="$WORKDIR/gpg.key"
fcreate_secure "$GPG_KEY_FILE"
$GPG --export-secret-key --armor --pinentry-mode loopback --passphrase
"$GPG_PASSPHRASE" "$GPG_KEY" > "$GPG_KEY_FILE"
+# Build base image first (contains common tools shared across all branches)
+run_silent "Building spark-rm-base image..." "docker-build-base.log" \
+ docker build -t "spark-rm-base:latest" -f "$SELF/spark-rm/Dockerfile.base"
"$SELF/spark-rm"
+
+# Build branch-specific image (extends base with Java/Python versions for this
branch)
run_silent "Building spark-rm image with tag $IMGTAG..." "docker-build.log" \
docker build -t "spark-rm:$IMGTAG" --build-arg UID=$UID "$SELF/spark-rm"
diff --git a/dev/create-release/spark-rm/Dockerfile
b/dev/create-release/spark-rm/Dockerfile
index eb37fad6cccd..5803a902cd06 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -15,134 +15,84 @@
# limitations under the License.
#
-# Image for building Spark releases. Based on Ubuntu 22.04.
-FROM ubuntu:jammy-20240911.1
-LABEL org.opencontainers.image.authors="Apache Spark project
<[email protected]>"
-LABEL org.opencontainers.image.licenses="Apache-2.0"
-LABEL org.opencontainers.image.ref.name="Apache Spark Release Manager Image"
-# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
-LABEL org.opencontainers.image.version=""
+# Spark 4.0 release image
+# Extends the base image with:
+# - Java 17
+# - Python 3.9/3.10 with required packages
+# - PyPy 3.10 for testing
-ENV FULL_REFRESH_DATE=20241119
-
-ENV DEBIAN_FRONTEND=noninteractive
-ENV DEBCONF_NONINTERACTIVE_SEEN=true
+FROM spark-rm-base:latest
+# Install Java 17 for Spark 4.x
RUN apt-get update && apt-get install -y \
- build-essential \
- ca-certificates \
- curl \
- gfortran \
- git \
- subversion \
- gnupg \
- libcurl4-openssl-dev \
- libfontconfig1-dev \
- libfreetype6-dev \
- libfribidi-dev \
- libgit2-dev \
- libharfbuzz-dev \
- libjpeg-dev \
- liblapack-dev \
- libopenblas-dev \
- libpng-dev \
- libpython3-dev \
- libssl-dev \
- libtiff5-dev \
- libwebp-dev \
- libxml2-dev \
- msmtp \
- nodejs \
- npm \
openjdk-17-jdk-headless \
- pandoc \
- pkg-config \
+ && rm -rf /var/lib/apt/lists/*
+
+# Install Python 3.9 and 3.10 from deadsnakes PPA
+RUN add-apt-repository ppa:deadsnakes/ppa && \
+ apt-get update && apt-get install -y \
+ python3.9 \
+ python3.9-dev \
+ python3.9-distutils \
python3.10 \
+ python3.10-dev \
python3-psutil \
- texlive-latex-base \
- texlive \
- texlive-fonts-extra \
- texinfo \
- texlive-latex-extra \
- qpdf \
- jq \
- r-base \
- ruby \
- ruby-dev \
- software-properties-common \
- wget \
- zlib1g-dev \
+ libpython3-dev \
&& rm -rf /var/lib/apt/lists/*
+# Install pip for both Python versions
+RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9 && \
+ curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
-RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/' >>
/etc/apt/sources.list
-RUN gpg --keyserver hkps://keyserver.ubuntu.com --recv-key
E298A3A825C0D65DFD57CBB651716619E084DAB9
-RUN gpg -a --export E084DAB9 | apt-key add -
-RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu
jammy-cran40/'
+# Basic Python packages for Spark 4.0
+ARG BASIC_PIP_PKGS="numpy pyarrow>=18.0.0 six==1.16.0 pandas==2.2.3 scipy
plotly<6.0.0 \
+ mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0
scikit-learn>=1.3.2 twine==3.4.1"
-# See more in SPARK-39959, roxygen2 < 7.2.1
-RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', \
- 'rmarkdown', 'testthat', 'devtools', 'e1071', 'survival', 'arrow', \
- 'ggplot2', 'mvtnorm', 'statmod', 'xml2'),
repos='https://cloud.r-project.org/')" && \
- Rscript -e "devtools::install_version('roxygen2', version='7.2.0',
repos='https://cloud.r-project.org')" && \
- Rscript -e "devtools::install_version('lintr', version='2.0.1',
repos='https://cloud.r-project.org')" && \
- Rscript -e "devtools::install_version('preferably', version='0.4',
repos='https://cloud.r-project.org')" && \
- Rscript -e "devtools::install_version('pkgdown', version='2.0.1',
repos='https://cloud.r-project.org')"
-
-# See more in SPARK-39735
-ENV
R_LIBS_SITE="/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
-
-
-RUN add-apt-repository ppa:pypy/ppa
-RUN mkdir -p /usr/local/pypy/pypy3.10 && \
- curl -sqL
https://downloads.python.org/pypy/pypy3.10-v7.3.17-linux64.tar.bz2 | tar xjf -
-C /usr/local/pypy/pypy3.10 --strip-components=1 && \
- ln -sf /usr/local/pypy/pypy3.10/bin/pypy /usr/local/bin/pypy3.10 && \
- ln -sf /usr/local/pypy/pypy3.10/bin/pypy /usr/local/bin/pypy3
-RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
-RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.2.3' scipy coverage
matplotlib lxml
-
-
-ARG BASIC_PIP_PKGS="numpy pyarrow>=18.0.0 six==1.16.0 pandas==2.2.3 scipy
plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0
scikit-learn>=1.3.2 twine==3.4.1"
# Python deps for Spark Connect
-ARG CONNECT_PIP_PKGS="grpcio==1.67.0 grpcio-status==1.67.0 protobuf==5.29.1
googleapis-common-protos==1.65.0 graphviz==0.20.3"
+ARG CONNECT_PIP_PKGS="grpcio==1.67.0 grpcio-status==1.67.0 protobuf==5.29.1 \
+ googleapis-common-protos==1.65.0 graphviz==0.20.3"
# Install Python 3.10 packages
-RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
-RUN python3.10 -m pip install --ignore-installed blinker>=1.6.2 # mlflow needs
this
-RUN python3.10 -m pip install --ignore-installed 'six==1.16.0' # Avoid
`python3-six` installation
-RUN python3.10 -m pip install $BASIC_PIP_PKGS unittest-xml-reporting
$CONNECT_PIP_PKGS && \
+RUN python3.10 -m pip install --ignore-installed 'blinker>=1.6.2' && \
+ python3.10 -m pip install --ignore-installed 'six==1.16.0' && \
+ python3.10 -m pip install $BASIC_PIP_PKGS unittest-xml-reporting
$CONNECT_PIP_PKGS && \
python3.10 -m pip install 'torch<2.6.0' torchvision --index-url
https://download.pytorch.org/whl/cpu && \
python3.10 -m pip install deepspeed torcheval && \
python3.10 -m pip cache purge
-# Install Python 3.9
-RUN add-apt-repository ppa:deadsnakes/ppa
-RUN apt-get update && apt-get install -y \
- python3.9 python3.9-distutils \
- && rm -rf /var/lib/apt/lists/*
-RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
-RUN python3.9 -m pip install --ignore-installed blinker>=1.6.2 # mlflow needs
this
-RUN python3.9 -m pip install --force $BASIC_PIP_PKGS unittest-xml-reporting
$CONNECT_PIP_PKGS && \
+# Install Python 3.9 packages
+RUN python3.9 -m pip install --ignore-installed 'blinker>=1.6.2' && \
+ python3.9 -m pip install --force $BASIC_PIP_PKGS unittest-xml-reporting
$CONNECT_PIP_PKGS && \
python3.9 -m pip install 'torch<2.6.0' torchvision --index-url
https://download.pytorch.org/whl/cpu && \
python3.9 -m pip install torcheval && \
python3.9 -m pip cache purge
+# Sphinx and documentation packages (installed on Python 3.9)
# Should unpin 'sphinxcontrib-*' after upgrading sphinx>5
-# See 'ipython_genutils' in SPARK-38517
-# See 'docutils<0.18.0' in SPARK-39421
-RUN python3.9 -m pip install 'sphinx==4.5.0' mkdocs
'pydata_sphinx_theme>=0.13' sphinx-copybutton nbsphinx numpydoc jinja2
markupsafe 'pyzmq<24.0.0' \
-ipython ipython_genutils sphinx_plotly_directive 'numpy>=1.20.0' pyarrow
pandas 'plotly>=4.8' 'docutils<0.18.0' \
-'flake8==3.9.0' 'mypy==1.8.0' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3'
'black==23.12.1' \
-'pandas-stubs==1.2.0.53' 'grpcio==1.67.0' 'grpc-stubs==1.24.11'
'googleapis-common-protos-stubs==2.2.0' \
-'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2'
'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3'
'sphinxcontrib-serializinghtml==1.1.5'
-RUN python3.9 -m pip list
-
-RUN gem install --no-document "bundler:2.4.22"
-RUN ln -s "$(which python3.9)" "/usr/local/bin/python"
-RUN ln -s "$(which python3.9)" "/usr/local/bin/python3"
+# See 'ipython_genutils' in SPARK-38517, 'docutils<0.18.0' in SPARK-39421
+RUN python3.9 -m pip install 'sphinx==4.5.0' mkdocs
'pydata_sphinx_theme>=0.13' \
+ sphinx-copybutton nbsphinx numpydoc jinja2 markupsafe 'pyzmq<24.0.0' \
+ ipython ipython_genutils sphinx_plotly_directive 'numpy>=1.20.0' pyarrow
pandas \
+ 'plotly>=4.8' 'docutils<0.18.0' 'flake8==3.9.0' 'mypy==1.8.0'
'pytest==7.1.3' \
+ 'pytest-mypy-plugins==1.9.3' 'black==23.12.1' 'pandas-stubs==1.2.0.53' \
+ 'grpcio==1.67.0' 'grpc-stubs==1.24.11'
'googleapis-common-protos-stubs==2.2.0' \
+ 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' \
+ 'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' \
+ 'sphinxcontrib-serializinghtml==1.1.5'
+
+# Install PyPy 3.10 for testing
+RUN mkdir -p /usr/local/pypy/pypy3.10 && \
+ curl -sqL
https://downloads.python.org/pypy/pypy3.10-v7.3.17-linux64.tar.bz2 | tar xjf -
-C /usr/local/pypy/pypy3.10 --strip-components=1 && \
+ ln -sf /usr/local/pypy/pypy3.10/bin/pypy /usr/local/bin/pypy3.10 && \
+ ln -sf /usr/local/pypy/pypy3.10/bin/pypy /usr/local/bin/pypy3 && \
+ curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 && \
+ pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.2.3' scipy coverage
matplotlib lxml
-WORKDIR /opt/spark-rm/output
+# Set Python 3.9 as the default (branch-4.0 uses 3.9 for docs)
+RUN ln -sf "$(which python3.9)" "/usr/local/bin/python" && \
+ ln -sf "$(which python3.9)" "/usr/local/bin/python3"
+# Create user for release manager
ARG UID
RUN useradd -m -s /bin/bash -p spark-rm -u $UID spark-rm
USER spark-rm:spark-rm
diff --git a/dev/create-release/spark-rm/Dockerfile.base
b/dev/create-release/spark-rm/Dockerfile.base
new file mode 100644
index 000000000000..56e85256d52d
--- /dev/null
+++ b/dev/create-release/spark-rm/Dockerfile.base
@@ -0,0 +1,110 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Base image for building Spark releases. Based on Ubuntu 22.04.
+# This image contains common tools shared across all Spark versions:
+# - Build tools (gcc, make, etc.)
+# - R with pinned package versions
+# - Ruby with bundler
+# - TeX for documentation
+# - Node.js for documentation
+#
+# Branch-specific Dockerfiles should use "FROM spark-rm-base:latest" and add:
+# - Java version (8 or 17)
+# - Python version and pip packages
+
+FROM ubuntu:jammy-20250819
+LABEL org.opencontainers.image.authors="Apache Spark project
<[email protected]>"
+LABEL org.opencontainers.image.licenses="Apache-2.0"
+LABEL org.opencontainers.image.ref.name="Apache Spark Release Manager Base
Image"
+LABEL org.opencontainers.image.version=""
+
+ENV FULL_REFRESH_DATE=20250819
+
+ENV DEBIAN_FRONTEND=noninteractive
+ENV DEBCONF_NONINTERACTIVE_SEEN=true
+
+# Install common system packages and build tools
+# Note: Java and Python are installed in branch-specific Dockerfiles
+RUN apt-get update && apt-get install -y \
+ build-essential \
+ ca-certificates \
+ curl \
+ gfortran \
+ git \
+ subversion \
+ gnupg \
+ libcurl4-openssl-dev \
+ libfontconfig1-dev \
+ libfreetype6-dev \
+ libfribidi-dev \
+ libgit2-dev \
+ libharfbuzz-dev \
+ libjpeg-dev \
+ liblapack-dev \
+ libopenblas-dev \
+ libpng-dev \
+ libssl-dev \
+ libtiff5-dev \
+ libwebp-dev \
+ libxml2-dev \
+ msmtp \
+ nodejs \
+ npm \
+ pandoc \
+ pkg-config \
+ texlive-latex-base \
+ texlive \
+ texlive-fonts-extra \
+ texinfo \
+ texlive-latex-extra \
+ qpdf \
+ jq \
+ r-base \
+ ruby \
+ ruby-dev \
+ software-properties-common \
+ wget \
+ zlib1g-dev \
+ && rm -rf /var/lib/apt/lists/*
+
+# Set up R CRAN repository for latest R packages
+RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/' >>
/etc/apt/sources.list && \
+ gpg --keyserver hkps://keyserver.ubuntu.com --recv-key
E298A3A825C0D65DFD57CBB651716619E084DAB9 && \
+ gpg -a --export E084DAB9 | apt-key add - && \
+ add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu
jammy-cran40/'
+
+# Install R packages (same versions across all branches)
+# See more in SPARK-39959, roxygen2 < 7.2.1
+RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', \
+ 'rmarkdown', 'testthat', 'e1071', 'survival', 'arrow', \
+ 'ggplot2', 'mvtnorm', 'statmod', 'xml2'),
repos='https://cloud.r-project.org/')" && \
+ Rscript -e "devtools::install_version('roxygen2', version='7.2.0',
repos='https://cloud.r-project.org')" && \
+ Rscript -e "devtools::install_version('lintr', version='2.0.1',
repos='https://cloud.r-project.org')" && \
+ Rscript -e "devtools::install_version('preferably', version='0.4',
repos='https://cloud.r-project.org')" && \
+ Rscript -e "devtools::install_version('pkgdown', version='2.0.1',
repos='https://cloud.r-project.org')"
+
+# See more in SPARK-39735
+ENV
R_LIBS_SITE="/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
+
+# Install Ruby bundler (same version across all branches)
+RUN gem install --no-document "bundler:2.4.22"
+
+# Create workspace directory
+WORKDIR /opt/spark-rm/output
+
+# Note: Java, Python, and user creation are done in branch-specific Dockerfiles
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]