This is an automated email from the ASF dual-hosted git repository.
zhengruifeng pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.x by this push:
new 3a2ca141c5e1 [SPARK-56864][INFRA][PYTHON][4.X] Consolidate
python-ps-minimum image into python-minimum
3a2ca141c5e1 is described below
commit 3a2ca141c5e1e2773fcf31898b63067b9f19866e
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Mon May 18 11:43:08 2026 +0800
[SPARK-56864][INFRA][PYTHON][4.X] Consolidate python-ps-minimum image into
python-minimum
### What changes were proposed in this pull request?
Backport of #55872 to branch-4.x.
This PR consolidates the `python-ps-minimum` Docker image and its CI
workflow into the existing `python-minimum` image, eliminating a near-duplicate.
Specifically:
- Deletes `dev/spark-test-image/python-ps-minimum/Dockerfile`.
- Deletes `.github/workflows/build_python_ps_minimum.yml`.
- Adds `"pyspark-pandas": "true"` to
`.github/workflows/build_python_minimum.yml` so Pandas API on Spark
minimum-deps coverage is preserved.
- Drops the `python-ps-minimum` entries from
`.github/workflows/build_infra_images_cache.yml` (the `paths` trigger and the
build/push step).
- Removes the `build_python_ps_minimum.yml` badge from `README.md`.
### Why are the changes needed?
To save CI resources. The two Dockerfiles were nearly identical. The only
functional differences were in `BASIC_PIP_PKGS`:
| Package | python-minimum | python-ps-minimum |
|---|---|---|
| `numpy` | pinned `==1.22.4` | unpinned |
| `scikit-learn` | included | omitted |
Everything else (base image, apt packages, Python version, venv setup,
`CONNECT_PIP_PKGS`) was the same. Maintaining both images doubles the image
build/cache cost and runs a duplicate scheduled workflow without commensurate
test value. Reusing `python-minimum` (which has the stricter pin and a superset
of packages) for the Pandas API on Spark minimum-deps job keeps coverage while
halving the image footprint and the associated CI runtime.
### Does this PR introduce _any_ user-facing change?
No. CI-only change.
### How was this patch tested?
Existing CI. The merged `build_python_minimum.yml` now runs both `pyspark`
and `pyspark-pandas` jobs against the `python-minimum` image.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (model: claude-opus-4-7)
Closes #55944 from zhengruifeng/backport-spark-56864-to-4.x.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
---
.github/workflows/build_infra_images_cache.yml | 14 -----
.github/workflows/build_python_minimum.yml | 3 +-
.github/workflows/build_python_ps_minimum.yml | 47 ---------------
README.md | 1 -
dev/spark-test-image/python-ps-minimum/Dockerfile | 70 -----------------------
5 files changed, 2 insertions(+), 133 deletions(-)
diff --git a/.github/workflows/build_infra_images_cache.yml
b/.github/workflows/build_infra_images_cache.yml
index e5eb3d9848b7..c3067f28306c 100644
--- a/.github/workflows/build_infra_images_cache.yml
+++ b/.github/workflows/build_infra_images_cache.yml
@@ -31,7 +31,6 @@ on:
- 'dev/spark-test-image/lint/Dockerfile'
- 'dev/spark-test-image/sparkr/Dockerfile'
- 'dev/spark-test-image/python-minimum/Dockerfile'
- - 'dev/spark-test-image/python-ps-minimum/Dockerfile'
- 'dev/spark-test-image/python-311/Dockerfile'
- 'dev/spark-test-image/python-312/Dockerfile'
- 'dev/spark-test-image/python-312-classic-only/Dockerfile'
@@ -125,19 +124,6 @@ jobs:
- name: Image digest (PySpark with old dependencies)
if: hashFiles('dev/spark-test-image/python-minimum/Dockerfile') != ''
run: echo ${{ steps.docker_build_pyspark_python_minimum.outputs.digest
}}
- - name: Build and push (PySpark PS with old dependencies)
- if: hashFiles('dev/spark-test-image/python-ps-minimum/Dockerfile') !=
''
- id: docker_build_pyspark_python_ps_minimum
- uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8
- with:
- context: ./dev/spark-test-image/python-ps-minimum/
- push: true
- tags:
ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-ps-minimum-cache:${{
github.ref_name }}-static
- cache-from:
type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-ps-minimum-cache:${{
github.ref_name }}
- cache-to:
type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-ps-minimum-cache:${{
github.ref_name }},mode=max
- - name: Image digest (PySpark PS with old dependencies)
- if: hashFiles('dev/spark-test-image/python-ps-minimum/Dockerfile') !=
''
- run: echo ${{
steps.docker_build_pyspark_python_ps_minimum.outputs.digest }}
- name: Build and push (PySpark with Python 3.11)
if: hashFiles('dev/spark-test-image/python-311/Dockerfile') != ''
id: docker_build_pyspark_python_311
diff --git a/.github/workflows/build_python_minimum.yml
b/.github/workflows/build_python_minimum.yml
index 3f898fbe5907..397c16f512f5 100644
--- a/.github/workflows/build_python_minimum.yml
+++ b/.github/workflows/build_python_minimum.yml
@@ -42,5 +42,6 @@ jobs:
}
jobs: >-
{
- "pyspark": "true"
+ "pyspark": "true",
+ "pyspark-pandas": "true"
}
diff --git a/.github/workflows/build_python_ps_minimum.yml
b/.github/workflows/build_python_ps_minimum.yml
deleted file mode 100644
index 10077e7ae735..000000000000
--- a/.github/workflows/build_python_ps_minimum.yml
+++ /dev/null
@@ -1,47 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-#
-
-name: "Build / Python-only (master, Minimum dependencies of Pandas API on
Spark)"
-
-on:
- schedule:
- - cron: '0 10 * * *'
- workflow_dispatch:
-
-jobs:
- run-build:
- permissions:
- packages: write
- name: Run
- uses: ./.github/workflows/build_and_test.yml
- if: github.repository == 'apache/spark'
- with:
- java: 17
- branch: master
- hadoop: hadoop3
- envs: >-
- {
- "PYSPARK_IMAGE_TO_TEST": "python-ps-minimum",
- "PYTHON_TO_TEST": "python3.11"
- }
- jobs: >-
- {
- "pyspark": "true",
- "pyspark-pandas": "true"
- }
diff --git a/README.md b/README.md
index 616520541b32..658566dbf245 100644
--- a/README.md
+++ b/README.md
@@ -51,7 +51,6 @@ This README file only contains basic setup instructions.
| | [](https://github.com/apache/spark/actions/workflows/build_python_3.14.yml)
|
| | [](https://github.com/apache/spark/actions/workflows/build_python_3.14_nogil.yml)
|
| | [](https://github.com/apache/spark/actions/workflows/build_python_minimum.yml)
|
-| | [](https://github.com/apache/spark/actions/workflows/build_python_ps_minimum.yml)
|
| | [](https://github.com/apache/spark/actions/workflows/build_python_connect40.yml)
|
| | [](https://github.com/apache/spark/actions/workflows/build_python_connect.yml)
|
| | [](https://github.com/apache/spark/actions/workflows/build_sparkr_window.yml)
|
diff --git a/dev/spark-test-image/python-ps-minimum/Dockerfile
b/dev/spark-test-image/python-ps-minimum/Dockerfile
deleted file mode 100644
index afbbe5a0d282..000000000000
--- a/dev/spark-test-image/python-ps-minimum/Dockerfile
+++ /dev/null
@@ -1,70 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# Image for building and testing Spark branches. Based on Ubuntu 24.04.
-# See also in https://hub.docker.com/_/ubuntu
-FROM ubuntu:noble
-LABEL org.opencontainers.image.authors="Apache Spark project
<[email protected]>"
-LABEL org.opencontainers.image.licenses="Apache-2.0"
-LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For Pandas
API on Spark with old dependencies"
-# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
-LABEL org.opencontainers.image.version=""
-
-ENV FULL_REFRESH_DATE=20260210
-
-ENV DEBIAN_FRONTEND=noninteractive
-ENV DEBCONF_NONINTERACTIVE_SEEN=true
-
-RUN printf 'Types: deb\nURIs: https://mirrors.edge.kernel.org/ubuntu\nSuites:
noble noble-updates noble-security\nComponents: main restricted universe
multiverse\nSigned-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg\n' >
/etc/apt/sources.list.d/mirror.sources
-
-# Should keep the installation consistent with
https://apache.github.io/spark/api/python/getting_started/install.html
-RUN apt-get update && apt-get install -y \
- build-essential \
- ca-certificates \
- curl \
- gfortran \
- git \
- gnupg \
- libgit2-dev \
- liblapack-dev \
- libopenblas-dev \
- libssl-dev \
- openjdk-17-jdk-headless \
- pkg-config \
- tzdata \
- software-properties-common \
- zlib1g-dev
-
-# Install Python 3.11
-RUN add-apt-repository ppa:deadsnakes/ppa
-RUN apt-get update && apt-get install -y \
- python3.11 \
- python3.11-venv \
- && apt-get autoremove --purge -y \
- && apt-get clean \
- && rm -rf /var/lib/apt/lists/*
-
-# Setup virtual environment
-ENV VIRTUAL_ENV=/opt/spark-venv
-RUN python3.11 -m venv $VIRTUAL_ENV
-ENV PATH="$VIRTUAL_ENV/bin:$PATH"
-
-ARG BASIC_PIP_PKGS="pyarrow==18.0.0 pandas==2.2.0 six==1.16.0 numpy scipy
coverage unittest-xml-reporting psutil"
-ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0
googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20
protobuf==6.33.5"
-
-RUN python3.11 -m pip install --force $BASIC_PIP_PKGS $CONNECT_PIP_PKGS && \
- python3.11 -m pip cache purge
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]