This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 7220d134670 [SPARK-46059][INFRA][PYTHON] Install `six==1.16.0`
explicitly for `pandas` in Python 3.12
7220d134670 is described below
commit 7220d134670efa2474f4581e5ae22786f85e6626
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Wed Nov 22 18:51:40 2023 -0800
[SPARK-46059][INFRA][PYTHON] Install `six==1.16.0` explicitly for `pandas`
in Python 3.12
### What changes were proposed in this pull request?
This PR aims to make it sure that `six==1.16.0` for `pandas` in Python 3.12.
### Why are the changes needed?
`import pandas` fails like the following if `six`'s version is lower than
`1.16.0`.
**BEFORE**
```python
$ docker run -it --rm
ghcr.io/apache/apache-spark-ci-image:master-6955850829 bash
WARNING: The requested image's platform (linux/amd64) does not match the
detected host platform (linux/arm64/v8) and no specific platform was requested
root39f78dbc0836:/# python3.12
Python 3.12.0 (main, Oct 21 2023, 17:44:38) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.12/dist-packages/pandas/__init__.py", line
46, in <module>
from pandas.core.api import (
File "/usr/local/lib/python3.12/dist-packages/pandas/core/api.py", line
1, in <module>
from pandas._libs import (
File "/usr/local/lib/python3.12/dist-packages/pandas/_libs/__init__.py",
line 18, in <module>
from pandas._libs.interval import Interval
File "interval.pyx", line 1, in init pandas._libs.interval
File "hashtable.pyx", line 1, in init pandas._libs.hashtable
File "missing.pyx", line 1, in init pandas._libs.missing
File
"/usr/local/lib/python3.12/dist-packages/pandas/_libs/tslibs/__init__.py", line
39, in <module>
from pandas._libs.tslibs.conversion import localize_pydatetime
File "conversion.pyx", line 1, in init pandas._libs.tslibs.conversion
File "offsets.pyx", line 1, in init pandas._libs.tslibs.offsets
File "timestamps.pyx", line 1, in init pandas._libs.tslibs.timestamps
File "timedeltas.pyx", line 1, in init pandas._libs.tslibs.timedeltas
File "timezones.pyx", line 24, in init pandas._libs.tslibs.timezones
File "/usr/local/lib/python3.12/dist-packages/dateutil/tz/__init__.py",
line 2, in <module>
from .tz import *
File "/usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py", line
21, in <module>
from six.moves import _thread
ModuleNotFoundError: No module named 'six.moves'
```
**AFTER**
```python
$ docker run -it --rm
ghcr.io/apache/apache-spark-ci-image:master-6955850829 bash
root35c02e3acdc1:/# python3.12 -m pip install six==1.16.0
Collecting six==1.16.0
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: six
Attempting uninstall: six
Found existing installation: six 1.14.0
Uninstalling six-1.14.0:
Successfully uninstalled six-1.14.0
Successfully installed six-1.16.0
WARNING: Running pip as the 'root' user can result in broken permissions
and conflicting behaviour with the system package manager. It is recommended to
use a virtual environment instead: https://pip.pypa.io/warnings/venv
root35c02e3acdc1:/# python3.12
Python 3.12.0 (main, Oct 21 2023, 17:44:38) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>>
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #43964 from dongjoon-hyun/SPARK-46059.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
dev/infra/Dockerfile | 10 +++++-----
dev/requirements.txt | 1 +
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index acd3ac0ce90..225de5f9ed5 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -92,8 +92,8 @@ RUN Rscript -e "devtools::install_version('preferably',
version='0.4', repos='ht
# See more in SPARK-39735
ENV R_LIBS_SITE
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
-RUN pypy3 -m pip install numpy 'pandas<=2.1.3' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy
unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl
'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
+RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.3' scipy coverage
matplotlib
+RUN python3.9 -m pip install numpy 'pyarrow>=14.0.0' 'six==1.16.0'
'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1'
coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
# Add Python deps for Spark Connect.
RUN python3.9 -m pip install 'grpcio==1.59.3' 'grpcio-status==1.59.3'
'protobuf==4.25.1' 'googleapis-common-protos==1.56.4'
@@ -110,7 +110,7 @@ RUN apt-get update && apt-get install -y \
python3.10 python3.10-distutils \
&& rm -rf /var/lib/apt/lists/*
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
-RUN python3.10 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy
unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl
'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
+RUN python3.10 -m pip install numpy 'pyarrow>=14.0.0' 'six==1.16.0'
'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1'
coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
RUN python3.10 -m pip install 'grpcio==1.59.3' 'grpcio-status==1.59.3'
'protobuf==4.25.1' 'googleapis-common-protos==1.56.4'
RUN python3.10 -m pip install 'torch<=2.0.1' torchvision --index-url
https://download.pytorch.org/whl/cpu
RUN python3.10 -m pip install torcheval
@@ -122,7 +122,7 @@ RUN apt-get update && apt-get install -y \
python3.11 python3.11-distutils \
&& rm -rf /var/lib/apt/lists/*
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11
-RUN python3.11 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy
unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl
'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
+RUN python3.11 -m pip install numpy 'pyarrow>=14.0.0' 'six==1.16.0'
'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1'
coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
RUN python3.11 -m pip install 'grpcio==1.59.3' 'grpcio-status==1.59.3'
'protobuf==4.25.1' 'googleapis-common-protos==1.56.4'
RUN python3.11 -m pip install 'torch<=2.0.1' torchvision --index-url
https://download.pytorch.org/whl/cpu
RUN python3.11 -m pip install torcheval
@@ -134,5 +134,5 @@ RUN apt-get update && apt-get install -y \
python3.12 python3.12-distutils \
&& rm -rf /var/lib/apt/lists/*
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12
-RUN python3.12 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy
unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl
'scikit-learn>=1.3.2'
+RUN python3.12 -m pip install numpy 'pyarrow>=14.0.0' 'six==1.16.0'
'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1'
coverage matplotlib openpyxl 'scikit-learn>=1.3.2'
RUN python3.12 -m pip install 'grpcio==1.59.3' 'grpcio-status==1.59.3'
'protobuf==4.25.1' 'googleapis-common-protos==1.56.4'
diff --git a/dev/requirements.txt b/dev/requirements.txt
index 0b629a3b044..66a74471377 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -4,6 +4,7 @@ py4j
# PySpark dependencies (optional)
numpy
pyarrow
+six==1.16.0
pandas
scipy
plotly
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]