This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 7220d134670 [SPARK-46059][INFRA][PYTHON] Install `six==1.16.0` 
explicitly for `pandas` in Python 3.12
7220d134670 is described below

commit 7220d134670efa2474f4581e5ae22786f85e6626
Author: Dongjoon Hyun <dh...@apple.com>
AuthorDate: Wed Nov 22 18:51:40 2023 -0800

    [SPARK-46059][INFRA][PYTHON] Install `six==1.16.0` explicitly for `pandas` 
in Python 3.12
    
    ### What changes were proposed in this pull request?
    
    This PR aims to make it sure that `six==1.16.0` for `pandas` in Python 3.12.
    
    ### Why are the changes needed?
    
    `import pandas` fails like the following if `six`'s version is lower than 
`1.16.0`.
    
    **BEFORE**
    
    ```python
    $ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-6955850829 bash
    WARNING: The requested image's platform (linux/amd64) does not match the 
detected host platform (linux/arm64/v8) and no specific platform was requested
    root39f78dbc0836:/# python3.12
    Python 3.12.0 (main, Oct 21 2023, 17:44:38) [GCC 9.4.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import pandas
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.12/dist-packages/pandas/__init__.py", line 
46, in <module>
        from pandas.core.api import (
      File "/usr/local/lib/python3.12/dist-packages/pandas/core/api.py", line 
1, in <module>
        from pandas._libs import (
      File "/usr/local/lib/python3.12/dist-packages/pandas/_libs/__init__.py", 
line 18, in <module>
        from pandas._libs.interval import Interval
      File "interval.pyx", line 1, in init pandas._libs.interval
      File "hashtable.pyx", line 1, in init pandas._libs.hashtable
      File "missing.pyx", line 1, in init pandas._libs.missing
      File 
"/usr/local/lib/python3.12/dist-packages/pandas/_libs/tslibs/__init__.py", line 
39, in <module>
        from pandas._libs.tslibs.conversion import localize_pydatetime
      File "conversion.pyx", line 1, in init pandas._libs.tslibs.conversion
      File "offsets.pyx", line 1, in init pandas._libs.tslibs.offsets
      File "timestamps.pyx", line 1, in init pandas._libs.tslibs.timestamps
      File "timedeltas.pyx", line 1, in init pandas._libs.tslibs.timedeltas
      File "timezones.pyx", line 24, in init pandas._libs.tslibs.timezones
      File "/usr/local/lib/python3.12/dist-packages/dateutil/tz/__init__.py", 
line 2, in <module>
        from .tz import *
      File "/usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py", line 
21, in <module>
        from six.moves import _thread
    ModuleNotFoundError: No module named 'six.moves'
    ```
    
    **AFTER**
    
    ```python
    $ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-6955850829 bash
    root35c02e3acdc1:/# python3.12 -m pip install six==1.16.0
    Collecting six==1.16.0
      Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
    Installing collected packages: six
      Attempting uninstall: six
        Found existing installation: six 1.14.0
        Uninstalling six-1.14.0:
          Successfully uninstalled six-1.14.0
    Successfully installed six-1.16.0
    WARNING: Running pip as the 'root' user can result in broken permissions 
and conflicting behaviour with the system package manager. It is recommended to 
use a virtual environment instead: https://pip.pypa.io/warnings/venv
    
    root35c02e3acdc1:/# python3.12
    Python 3.12.0 (main, Oct 21 2023, 17:44:38) [GCC 9.4.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import pandas
    >>>
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Pass the CIs.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #43964 from dongjoon-hyun/SPARK-46059.
    
    Authored-by: Dongjoon Hyun <dh...@apple.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 dev/infra/Dockerfile | 10 +++++-----
 dev/requirements.txt |  1 +
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index acd3ac0ce90..225de5f9ed5 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -92,8 +92,8 @@ RUN Rscript -e "devtools::install_version('preferably', 
version='0.4', repos='ht
 # See more in SPARK-39735
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
-RUN pypy3 -m pip install numpy 'pandas<=2.1.3' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
+RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.3' scipy coverage 
matplotlib
+RUN python3.9 -m pip install numpy 'pyarrow>=14.0.0' 'six==1.16.0' 
'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' 
coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
 
 # Add Python deps for Spark Connect.
 RUN python3.9 -m pip install 'grpcio==1.59.3' 'grpcio-status==1.59.3' 
'protobuf==4.25.1' 'googleapis-common-protos==1.56.4'
@@ -110,7 +110,7 @@ RUN apt-get update && apt-get install -y \
     python3.10 python3.10-distutils \
     && rm -rf /var/lib/apt/lists/*
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
-RUN python3.10 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
+RUN python3.10 -m pip install numpy 'pyarrow>=14.0.0' 'six==1.16.0' 
'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' 
coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
 RUN python3.10 -m pip install 'grpcio==1.59.3' 'grpcio-status==1.59.3' 
'protobuf==4.25.1' 'googleapis-common-protos==1.56.4'
 RUN python3.10 -m pip install 'torch<=2.0.1' torchvision --index-url 
https://download.pytorch.org/whl/cpu
 RUN python3.10 -m pip install torcheval
@@ -122,7 +122,7 @@ RUN apt-get update && apt-get install -y \
     python3.11 python3.11-distutils \
     && rm -rf /var/lib/apt/lists/*
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11
-RUN python3.11 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
+RUN python3.11 -m pip install numpy 'pyarrow>=14.0.0' 'six==1.16.0' 
'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' 
coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
 RUN python3.11 -m pip install 'grpcio==1.59.3' 'grpcio-status==1.59.3' 
'protobuf==4.25.1' 'googleapis-common-protos==1.56.4'
 RUN python3.11 -m pip install 'torch<=2.0.1' torchvision --index-url 
https://download.pytorch.org/whl/cpu
 RUN python3.11 -m pip install torcheval
@@ -134,5 +134,5 @@ RUN apt-get update && apt-get install -y \
     python3.12 python3.12-distutils \
     && rm -rf /var/lib/apt/lists/*
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12
-RUN python3.12 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 
'scikit-learn>=1.3.2'
+RUN python3.12 -m pip install numpy 'pyarrow>=14.0.0' 'six==1.16.0' 
'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' 
coverage matplotlib openpyxl 'scikit-learn>=1.3.2'
 RUN python3.12 -m pip install 'grpcio==1.59.3' 'grpcio-status==1.59.3' 
'protobuf==4.25.1' 'googleapis-common-protos==1.56.4'
diff --git a/dev/requirements.txt b/dev/requirements.txt
index 0b629a3b044..66a74471377 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -4,6 +4,7 @@ py4j
 # PySpark dependencies (optional)
 numpy
 pyarrow
+six==1.16.0
 pandas
 scipy
 plotly


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to