(spark) branch master updated: [SPARK-54849][PYTHON] Upgrade the minimum version of `pyarrow` to 18.0.0

ruifengz Tue, 30 Dec 2025 02:47:01 -0800

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new f80b919d19ce [SPARK-54849][PYTHON] Upgrade the minimum version of 
`pyarrow` to 18.0.0
f80b919d19ce is described below

commit f80b919d19ce8e390c029857802cfe40190bb5ea
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Tue Dec 30 18:46:30 2025 +0800

    [SPARK-54849][PYTHON] Upgrade the minimum version of `pyarrow` to 18.0.0
    
    ### What changes were proposed in this pull request?
    Upgrade the minimum version of `pyarrow` to 18.0.0
    
    ### Why are the changes needed?
    1, pyarrow 18.0.0 was released at Oct 28, 2024;
    2, there is a security issue 
[PYSEC-2024-161](https://osv.dev/vulnerability/PYSEC-2024-161) in pyarrow, the 
affected versions are 4.0.0 ~ 16.1.0, and it is recommended to upgrade to 17+;
    3, since 18.0.0, pyarrow no longer depends on numpy, which will make the 
dependencies simpler to resolve;
    
    ### Does this PR introduce _any_ user-facing change?
    no
    
    ### How was this patch tested?
    PR builder with
    ```
    default: '{"PYSPARK_IMAGE_TO_TEST": "python-minimum", "PYTHON_TO_TEST": 
"python3.10"}'
    ```
    
    https://github.com/zhengruifeng/spark/runs/59127283398
    
    ### Was this patch authored or co-authored using generative AI tooling?
    no
    
    Closes #53619 from zhengruifeng/upgrade_arrow_18.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 dev/requirements.txt                              | 2 +-
 dev/spark-test-image/python-minimum/Dockerfile    | 4 ++--
 dev/spark-test-image/python-ps-minimum/Dockerfile | 4 ++--
 python/docs/source/getting_started/install.rst    | 8 ++++----
 python/pyspark/sql/classic/dataframe.py           | 2 +-
 python/pyspark/sql/connect/dataframe.py           | 2 +-
 6 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/dev/requirements.txt b/dev/requirements.txt
index 2508b79d5e16..61c7e540f3bf 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -3,7 +3,7 @@ py4j>=0.10.9.9
 
 # PySpark dependencies (optional)
 numpy>=1.22
-pyarrow>=15.0.0
+pyarrow>=18.0.0
 six==1.16.0
 pandas>=2.2.0
 scipy
diff --git a/dev/spark-test-image/python-minimum/Dockerfile 
b/dev/spark-test-image/python-minimum/Dockerfile
index 575b4afdd02c..2981223c19f6 100644
--- a/dev/spark-test-image/python-minimum/Dockerfile
+++ b/dev/spark-test-image/python-minimum/Dockerfile
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra 
Image For PySpark wi
 # Overwrite this label to avoid exposing the underlying Ubuntu OS version label
 LABEL org.opencontainers.image.version=""
 
-ENV FULL_REFRESH_DATE=20250703
+ENV FULL_REFRESH_DATE=20251225
 
 ENV DEBIAN_FRONTEND=noninteractive
 ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -62,7 +62,7 @@ RUN apt-get update && apt-get install -y \
     wget \
     zlib1g-dev
 
-ARG BASIC_PIP_PKGS="numpy==1.22.4 pyarrow==15.0.0 pandas==2.2.0 six==1.16.0 
scipy scikit-learn coverage unittest-xml-reporting"
+ARG BASIC_PIP_PKGS="numpy==1.22.4 pyarrow==18.0.0 pandas==2.2.0 six==1.16.0 
scipy scikit-learn coverage unittest-xml-reporting"
 # Python deps for Spark Connect
 ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 
googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20 protobuf"
 
diff --git a/dev/spark-test-image/python-ps-minimum/Dockerfile 
b/dev/spark-test-image/python-ps-minimum/Dockerfile
index 5142d46cc3eb..94edd66c7138 100644
--- a/dev/spark-test-image/python-ps-minimum/Dockerfile
+++ b/dev/spark-test-image/python-ps-minimum/Dockerfile
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra 
Image For Pandas API
 # Overwrite this label to avoid exposing the underlying Ubuntu OS version label
 LABEL org.opencontainers.image.version=""
 
-ENV FULL_REFRESH_DATE=20250708
+ENV FULL_REFRESH_DATE=20251225
 
 ENV DEBIAN_FRONTEND=noninteractive
 ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -63,7 +63,7 @@ RUN apt-get update && apt-get install -y \
     zlib1g-dev
 
 
-ARG BASIC_PIP_PKGS="pyarrow==15.0.0 pandas==2.2.0 six==1.16.0 numpy scipy 
coverage unittest-xml-reporting"
+ARG BASIC_PIP_PKGS="pyarrow==18.0.0 pandas==2.2.0 six==1.16.0 numpy scipy 
coverage unittest-xml-reporting"
 # Python deps for Spark Connect
 ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 
googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20 protobuf"
 
diff --git a/python/docs/source/getting_started/install.rst 
b/python/docs/source/getting_started/install.rst
index 6b5a09205e4a..8ee075e693ea 100644
--- a/python/docs/source/getting_started/install.rst
+++ b/python/docs/source/getting_started/install.rst
@@ -226,7 +226,7 @@ Installable with ``pip install "pyspark[connect]"``.
 Package                    Supported version Note
 ========================== ================= ==========================
 `pandas`                   >=2.2.0           Required for Spark Connect
-`pyarrow`                  >=15.0.0          Required for Spark Connect
+`pyarrow`                  >=18.0.0          Required for Spark Connect
 `grpcio`                   >=1.76.0          Required for Spark Connect
 `grpcio-status`            >=1.76.0          Required for Spark Connect
 `googleapis-common-protos` >=1.71.0          Required for Spark Connect
@@ -243,7 +243,7 @@ Installable with ``pip install "pyspark[sql]"``.
 Package   Supported version Note
 ========= ================= ======================
 `pandas`  >=2.2.0           Required for Spark SQL
-`pyarrow` >=15.0.0          Required for Spark SQL
+`pyarrow` >=18.0.0          Required for Spark SQL
 ========= ================= ======================
 
 Additional libraries that enhance functionality but are not included in the 
installation packages:
@@ -260,7 +260,7 @@ Installable with ``pip install "pyspark[pandas_on_spark]"``.
 Package   Supported version Note
 ========= ================= ================================
 `pandas`  >=2.2.0           Required for Pandas API on Spark
-`pyarrow` >=15.0.0          Required for Pandas API on Spark
+`pyarrow` >=18.0.0          Required for Pandas API on Spark
 ========= ================= ================================
 
 Additional libraries that enhance functionality but are not included in the 
installation packages:
@@ -310,7 +310,7 @@ Installable with ``pip install "pyspark[pipelines]"``. 
Includes all dependencies
 Package                    Supported version Note
 ========================== ================= 
===================================================
 `pandas`                   >=2.2.0           Required for Spark Connect and 
Spark SQL
-`pyarrow`                  >=15.0.0          Required for Spark Connect and 
Spark SQL
+`pyarrow`                  >=18.0.0          Required for Spark Connect and 
Spark SQL
 `grpcio`                   >=1.76.0          Required for Spark Connect
 `grpcio-status`            >=1.76.0          Required for Spark Connect
 `googleapis-common-protos` >=1.71.0          Required for Spark Connect
diff --git a/python/pyspark/sql/classic/dataframe.py 
b/python/pyspark/sql/classic/dataframe.py
index 6533f8a5ffc0..634006bdbf8c 100644
--- a/python/pyspark/sql/classic/dataframe.py
+++ b/python/pyspark/sql/classic/dataframe.py
@@ -2011,7 +2011,7 @@ def _test() -> None:
         import pyarrow as pa
         from pyspark.loose_version import LooseVersion
 
-        if LooseVersion(pa.__version__) < LooseVersion("17.0.0"):
+        if LooseVersion(pa.__version__) < LooseVersion("21.0.0"):
             del pyspark.sql.dataframe.DataFrame.mapInArrow.__doc__
 
     spark = (
diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index d10a576dec77..e60b1c21c556 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -2376,7 +2376,7 @@ def _test() -> None:
         import pyarrow as pa
         from pyspark.loose_version import LooseVersion
 
-        if LooseVersion(pa.__version__) < LooseVersion("17.0.0"):
+        if LooseVersion(pa.__version__) < LooseVersion("21.0.0"):
             del pyspark.sql.dataframe.DataFrame.mapInArrow.__doc__
 
     globs["spark"] = (


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-54849][PYTHON] Upgrade the minimum version of `pyarrow` to 18.0.0

Reply via email to