This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new b03c69ce773a [SPARK-55346][INFRA][PYTHON] Upgrade pystack version to
1.6.0 and install it on all major images
b03c69ce773a is described below
commit b03c69ce773a4deb68d0f5145e4a70eb8b02830c
Author: Tian Gao <[email protected]>
AuthorDate: Wed Feb 4 16:18:47 2026 +0900
[SPARK-55346][INFRA][PYTHON] Upgrade pystack version to 1.6.0 and install
it on all major images
### What changes were proposed in this pull request?
* Upgrade pystack to >= 1.6.0 because it supports 3.13t now
* Install it (and psutil) on all major docker images
### Why are the changes needed?
pystack used to lack 3.13t wheels and we had to skip 3.13 for requirements.
Now it supports it so we don't need this special rule.
`pystack` has been proven very useful to find hanging issues
(https://github.com/apache/spark/pull/53783). Enabling it on not only master,
but also other scheduled tests could help us diagnosis more hanging issues
(notice that master is using 3.12 now so we are not even using it on master).
For example,
https://github.com/apache/spark/actions/runs/21645825351/job/62398366525 is a
hanging issue but we have no information from it.
https://github.com/apache/spark/actions/runs/21648052684/job/62405320893
also timed out without useful information.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
It has been working well with 3.11 without causing issues. It helped us
figure out a very difficult racing issue.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #54124 from gaogaotiantian/upgrade-pystack.
Authored-by: Tian Gao <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
dev/requirements.txt | 2 +-
dev/spark-test-image/python-310/Dockerfile | 4 ++--
dev/spark-test-image/python-311/Dockerfile | 4 ++--
dev/spark-test-image/python-312-classic-only/Dockerfile | 4 ++--
dev/spark-test-image/python-312/Dockerfile | 4 ++--
dev/spark-test-image/python-313/Dockerfile | 4 ++--
dev/spark-test-image/python-314-nogil/Dockerfile | 4 ++--
dev/spark-test-image/python-314/Dockerfile | 4 ++--
8 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/dev/requirements.txt b/dev/requirements.txt
index 840d104bd8ab..10e1a6faa11b 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -77,7 +77,7 @@ graphviz==0.20.3
flameprof==0.4
viztracer
debugpy
-pystack>=1.5.1; python_version!='3.13' and sys_platform=='linux' # no 3.13t
wheels
+pystack>=1.6.0; sys_platform=='linux'
psutil
# TorchDistributor dependencies
diff --git a/dev/spark-test-image/python-310/Dockerfile
b/dev/spark-test-image/python-310/Dockerfile
index 8db320a41355..4cea4a986ef8 100644
--- a/dev/spark-test-image/python-310/Dockerfile
+++ b/dev/spark-test-image/python-310/Dockerfile
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra
Image For PySpark wi
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
LABEL org.opencontainers.image.version=""
-ENV FULL_REFRESH_DATE=20260124
+ENV FULL_REFRESH_DATE=20260203
ENV DEBIAN_FRONTEND=noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -56,7 +56,7 @@ RUN apt-get update && apt-get install -y \
&& rm -rf /var/lib/apt/lists/*
-ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy
plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0
scikit-learn>=1.3.2"
+ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy
plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0
scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
# Python deps for Spark Connect
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5
googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
diff --git a/dev/spark-test-image/python-311/Dockerfile
b/dev/spark-test-image/python-311/Dockerfile
index 4ec4e70498d0..2335c82cdcaf 100644
--- a/dev/spark-test-image/python-311/Dockerfile
+++ b/dev/spark-test-image/python-311/Dockerfile
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra
Image For PySpark wi
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
LABEL org.opencontainers.image.version=""
-ENV FULL_REFRESH_DATE=20260124
+ENV FULL_REFRESH_DATE=20260203
ENV DEBIAN_FRONTEND=noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -55,7 +55,7 @@ RUN apt-get update && apt-get install -y \
&& rm -rf /var/lib/apt/lists/*
-ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy
plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0
scikit-learn>=1.3.2 pystack psutil"
+ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy
plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0
scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
# Python deps for Spark Connect
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5
googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
diff --git a/dev/spark-test-image/python-312-classic-only/Dockerfile
b/dev/spark-test-image/python-312-classic-only/Dockerfile
index ed18ae2592fb..685f4e80315c 100644
--- a/dev/spark-test-image/python-312-classic-only/Dockerfile
+++ b/dev/spark-test-image/python-312-classic-only/Dockerfile
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra
Image For PySpark Cl
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
LABEL org.opencontainers.image.version=""
-ENV FULL_REFRESH_DATE=20260127
+ENV FULL_REFRESH_DATE=20260203
ENV DEBIAN_FRONTEND=noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -55,7 +55,7 @@ RUN apt-get update && apt-get install -y \
&& rm -rf /var/lib/apt/lists/*
-ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 pandas==2.3.3 plotly<6.0.0
matplotlib openpyxl memory-profiler>=0.61.0 mlflow>=2.8.1 scipy
scikit-learn>=1.3.2"
+ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 pandas==2.3.3 plotly<6.0.0
matplotlib openpyxl memory-profiler>=0.61.0 mlflow>=2.8.1 scipy
scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
ARG TEST_PIP_PKGS="coverage unittest-xml-reporting"
# Install Python 3.12 packages
diff --git a/dev/spark-test-image/python-312/Dockerfile
b/dev/spark-test-image/python-312/Dockerfile
index eae01b72e054..b0df146a682a 100644
--- a/dev/spark-test-image/python-312/Dockerfile
+++ b/dev/spark-test-image/python-312/Dockerfile
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra
Image For PySpark wi
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
LABEL org.opencontainers.image.version=""
-ENV FULL_REFRESH_DATE=20260124
+ENV FULL_REFRESH_DATE=20260203
ENV DEBIAN_FRONTEND=noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -55,7 +55,7 @@ RUN apt-get update && apt-get install -y \
&& rm -rf /var/lib/apt/lists/*
-ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy
plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0
scikit-learn>=1.3.2"
+ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy
plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0
scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
# Python deps for Spark Connect
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5
googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
diff --git a/dev/spark-test-image/python-313/Dockerfile
b/dev/spark-test-image/python-313/Dockerfile
index 0280d9cbeaa8..a7cb727c29be 100644
--- a/dev/spark-test-image/python-313/Dockerfile
+++ b/dev/spark-test-image/python-313/Dockerfile
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra
Image For PySpark wi
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
LABEL org.opencontainers.image.version=""
-ENV FULL_REFRESH_DATE=20260124
+ENV FULL_REFRESH_DATE=20260203
ENV DEBIAN_FRONTEND=noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -55,7 +55,7 @@ RUN apt-get update && apt-get install -y \
&& rm -rf /var/lib/apt/lists/*
-ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy
plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0
scikit-learn>=1.3.2"
+ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy
plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0
scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
# Python deps for Spark Connect
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5
googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
diff --git a/dev/spark-test-image/python-314-nogil/Dockerfile
b/dev/spark-test-image/python-314-nogil/Dockerfile
index b745557fb496..966c8b59d6a0 100644
--- a/dev/spark-test-image/python-314-nogil/Dockerfile
+++ b/dev/spark-test-image/python-314-nogil/Dockerfile
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra
Image For PySpark wi
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
LABEL org.opencontainers.image.version=""
-ENV FULL_REFRESH_DATE=20260127
+ENV FULL_REFRESH_DATE=20260203
ENV DEBIAN_FRONTEND=noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -64,5 +64,5 @@ RUN curl -sS https://bootstrap.pypa.io/get-pip.py |
python3.14t
# TODO: Add BASIC_PIP_PKGS and CONNECT_PIP_PKGS when it supports Python 3.14
free threaded
# TODO: Add lxml, grpcio, grpcio-status back when they support Python 3.14
free threaded
RUN python3.14t -m pip install --ignore-installed 'blinker>=1.6.2' # mlflow
needs this
-RUN python3.14t -m pip install 'numpy>=2.1' 'pyarrow>=19.0.0' 'six==1.16.0'
'pandas==2.3.3' scipy coverage matplotlib openpyxl jinja2 && \
+RUN python3.14t -m pip install 'numpy>=2.1' 'pyarrow>=19.0.0' 'six==1.16.0'
'pandas==2.3.3' 'pystack>=1.6.0' scipy coverage matplotlib openpyxl jinja2
psutil && \
python3.14t -m pip cache purge
diff --git a/dev/spark-test-image/python-314/Dockerfile
b/dev/spark-test-image/python-314/Dockerfile
index df2a73b12fb8..de7dec4d96c0 100644
--- a/dev/spark-test-image/python-314/Dockerfile
+++ b/dev/spark-test-image/python-314/Dockerfile
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra
Image For PySpark wi
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
LABEL org.opencontainers.image.version=""
-ENV FULL_REFRESH_DATE=20260124
+ENV FULL_REFRESH_DATE=20260203
ENV DEBIAN_FRONTEND=noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -55,7 +55,7 @@ RUN apt-get update && apt-get install -y \
&& rm -rf /var/lib/apt/lists/*
-ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy
plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0
scikit-learn>=1.3.2"
+ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy
plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0
scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
# Python deps for Spark Connect
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5
googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]