This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 0b7736c1d12 [SPARK-45953][INFRA] Add `Python 3.10` to Infra docker
image
0b7736c1d12 is described below
commit 0b7736c1d121947e418a356cf0431d9d7e969c90
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Thu Nov 16 13:37:38 2023 -0800
[SPARK-45953][INFRA] Add `Python 3.10` to Infra docker image
### What changes were proposed in this pull request?
This PR aims to add `Python 3.10` to Infra docker images.
### Why are the changes needed?
This is a preparation to add a daily `Python 3.10` GitHub Action job later
for Apache Spark 4.0.0.
Note that Python 3.10 is installed at the last step to avoid the following
issues which happens when we install Python 3.9 and 3.10 at the same stage by
package manager.
```
#21 13.03 ERROR: Cannot uninstall 'blinker'. It is a distutils installed
project and thus we cannot accurately determine which files belong to it which
would lead to only a partial uninstall.
#21 ERROR: process "/bin/sh -c python3.9 -m pip install numpy
'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8
'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0'
'scikit-learn==1.1.*'" did not complete successfully: exit code: 1
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
1. I verified that the Python CI is not affected and still use Python 3.9.5
only.
```
========================================================================
Running PySpark tests
========================================================================
Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log
Will test against the following Python executables: ['python3.9']
Will test the following Python modules: ['pyspark-errors']
python3.9 python_implementation is CPython
python3.9 version is: Python 3.9.5
Starting test(python3.9): pyspark.errors.tests.test_errors (temp output:
/__w/spark/spark/python/target/fd967f24-3607-4aa6-8190-3f8d7de522e1/python3.9__pyspark.errors.tests.test_errors___zauwgy1.log)
Finished test(python3.9): pyspark.errors.tests.test_errors (0s)
Tests passed in 0 seconds
```
2. Pass `Base Image Build` step for new Python 3.10.

3. Since new Python 3.10 is not used in CI, we need to validate like the
following.
```
$ docker run -it --rm
ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-6895105871 python3.10
--version
Python 3.10.13
```
```
$ docker run -it --rm
ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-6895105871 python3.10 -m pip
freeze
alembic==1.12.1
annotated-types==0.6.0
blinker==1.7.0
certifi==2019.11.28
chardet==3.0.4
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==2.2.1
contourpy==1.2.0
coverage==7.3.2
cycler==0.12.1
databricks-cli==0.18.0
dbus-python==1.2.16
deepspeed==0.12.3
distro-info==0.23+ubuntu1.1
docker==6.1.3
entrypoints==0.4
et-xmlfile==1.1.0
filelock==3.9.0
Flask==3.0.0
fonttools==4.44.3
gitdb==4.0.11
GitPython==3.1.40
googleapis-common-protos==1.56.4
greenlet==3.0.1
grpcio==1.56.2
grpcio-status==1.48.2
gunicorn==21.2.0
hjson==3.1.0
idna==2.8
importlib-metadata==6.8.0
itsdangerous==2.1.2
Jinja2==3.1.2
joblib==1.3.2
kiwisolver==1.4.5
lxml==4.9.3
Mako==1.3.0
Markdown==3.5.1
MarkupSafe==2.1.3
matplotlib==3.8.1
memory-profiler==0.60.0
mlflow==2.8.1
mpmath==1.3.0
networkx==3.0
ninja==1.11.1.1
numpy==1.26.2
oauthlib==3.2.2
openpyxl==3.1.2
packaging==23.2
pandas==2.1.3
Pillow==10.1.0
plotly==5.18.0
protobuf==3.20.3
psutil==5.9.6
py-cpuinfo==9.0.0
pyarrow==14.0.1
pydantic==2.5.1
pydantic_core==2.14.3
PyGObject==3.36.0
PyJWT==2.8.0
pynvml==11.5.0
pyparsing==3.1.1
python-apt==2.0.1+ubuntu0.20.4.1
python-dateutil==2.8.2
pytz==2023.3.post1
PyYAML==6.0.1
querystring-parser==1.2.4
requests==2.31.0
requests-unixsocket==0.2.0
scikit-learn==1.1.3
scipy==1.11.3
six==1.14.0
smmap==5.0.1
SQLAlchemy==2.0.23
sqlparse==0.4.4
sympy==1.12
tabulate==0.9.0
tenacity==8.2.3
threadpoolctl==3.2.0
torch==2.0.1+cpu
torcheval==0.0.7
torchvision==0.15.2+cpu
tqdm==4.66.1
typing_extensions==4.8.0
tzdata==2023.3
unattended-upgrades==0.1
unittest-xml-reporting==3.2.0
urllib3==2.1.0
websocket-client==1.6.4
Werkzeug==3.0.1
zipp==3.17.0
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #43840 from dongjoon-hyun/SPARK-45953.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
dev/infra/Dockerfile | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 8d12f00a034..0231414eec6 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -95,3 +95,15 @@ RUN python3.9 -m pip install 'torch<=2.0.1' torchvision
--index-url https://down
RUN python3.9 -m pip install torcheval
# Add Deepspeed as a testing dependency for DeepspeedTorchDistributor
RUN python3.9 -m pip install deepspeed
+
+# Install Python 3.10 at the last stage to avoid breaking Python 3.9
+RUN add-apt-repository ppa:deadsnakes/ppa
+RUN apt-get update && apt-get install -y \
+ python3.10 python3.10-distutils \
+ && rm -rf /var/lib/apt/lists/*
+RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
+RUN python3.10 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy
unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl
'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN python3.10 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57'
'protobuf==3.20.3' 'googleapis-common-protos==1.56.4'
+RUN python3.10 -m pip install 'torch<=2.0.1' torchvision --index-url
https://download.pytorch.org/whl/cpu
+RUN python3.10 -m pip install torcheval
+RUN python3.10 -m pip install deepspeed
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]