This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 9c9bdab64a4c [SPARK-50657][PYTHON] Upgrade the minimum version of
`pyarrow` to 11.0.0
9c9bdab64a4c is described below
commit 9c9bdab64a4ca96fb648b19608d1b4def61c90ab
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Wed Dec 25 12:11:45 2024 -0800
[SPARK-50657][PYTHON] Upgrade the minimum version of `pyarrow` to 11.0.0
### What changes were proposed in this pull request?
Upgrade the minimum version of `pyarrow` to 11.0.0
### Why are the changes needed?
according to my test in https://github.com/apache/spark/pull/49267, pyspark
with `pyarrow=10.0.0` has already been broken
- pyspark-sql failed
- pyspark-connect failed
- pyspark-pandas failed
see
https://github.com/zhengruifeng/spark/actions/runs/12464102622/job/34787749014
### Does this PR introduce _any_ user-facing change?
doc changes
### How was this patch tested?
ci
### Was this patch authored or co-authored using generative AI tooling?
no
Closes #49282 from zhengruifeng/mini_arrow_11.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
dev/requirements.txt | 2 +-
python/docs/source/getting_started/install.rst | 6 +++---
python/docs/source/migration_guide/pyspark_upgrade.rst | 2 +-
python/packaging/classic/setup.py | 2 +-
python/packaging/connect/setup.py | 2 +-
python/pyspark/sql/pandas/utils.py | 2 +-
6 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/dev/requirements.txt b/dev/requirements.txt
index 04cab4cbfcc3..33300cc28d3c 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -3,7 +3,7 @@ py4j>=0.10.9.7
# PySpark dependencies (optional)
numpy>=1.21
-pyarrow>=10.0.0
+pyarrow>=11.0.0
six==1.16.0
pandas>=2.0.0
scipy
diff --git a/python/docs/source/getting_started/install.rst
b/python/docs/source/getting_started/install.rst
index 2b9f28135bb1..b35588a618ac 100644
--- a/python/docs/source/getting_started/install.rst
+++ b/python/docs/source/getting_started/install.rst
@@ -207,7 +207,7 @@ Installable with ``pip install "pyspark[connect]"``.
Package Supported version Note
========================== ================= ==========================
`pandas` >=2.0.0 Required for Spark Connect
-`pyarrow` >=10.0.0 Required for Spark Connect
+`pyarrow` >=11.0.0 Required for Spark Connect
`grpcio` >=1.67.0 Required for Spark Connect
`grpcio-status` >=1.67.0 Required for Spark Connect
`googleapis-common-protos` >=1.65.0 Required for Spark Connect
@@ -223,7 +223,7 @@ Installable with ``pip install "pyspark[sql]"``.
Package Supported version Note
========= ================= ======================
`pandas` >=2.0.0 Required for Spark SQL
-`pyarrow` >=10.0.0 Required for Spark SQL
+`pyarrow` >=11.0.0 Required for Spark SQL
========= ================= ======================
Additional libraries that enhance functionality but are not included in the
installation packages:
@@ -240,7 +240,7 @@ Installable with ``pip install "pyspark[pandas_on_spark]"``.
Package Supported version Note
========= ================= ================================
`pandas` >=2.0.0 Required for Pandas API on Spark
-`pyarrow` >=10.0.0 Required for Pandas API on Spark
+`pyarrow` >=11.0.0 Required for Pandas API on Spark
========= ================= ================================
Additional libraries that enhance functionality but are not included in the
installation packages:
diff --git a/python/docs/source/migration_guide/pyspark_upgrade.rst
b/python/docs/source/migration_guide/pyspark_upgrade.rst
index 529253042002..55d067eb5fa2 100644
--- a/python/docs/source/migration_guide/pyspark_upgrade.rst
+++ b/python/docs/source/migration_guide/pyspark_upgrade.rst
@@ -25,7 +25,7 @@ Upgrading from PySpark 3.5 to 4.0
* In Spark 4.0, Python 3.8 support was dropped in PySpark.
* In Spark 4.0, the minimum supported version for Pandas has been raised from
1.0.5 to 2.0.0 in PySpark.
* In Spark 4.0, the minimum supported version for Numpy has been raised from
1.15 to 1.21 in PySpark.
-* In Spark 4.0, the minimum supported version for PyArrow has been raised from
4.0.0 to 10.0.0 in PySpark.
+* In Spark 4.0, the minimum supported version for PyArrow has been raised from
4.0.0 to 11.0.0 in PySpark.
* In Spark 4.0, ``Int64Index`` and ``Float64Index`` have been removed from
pandas API on Spark, ``Index`` should be used directly.
* In Spark 4.0, ``DataFrame.iteritems`` has been removed from pandas API on
Spark, use ``DataFrame.items`` instead.
* In Spark 4.0, ``Series.iteritems`` has been removed from pandas API on
Spark, use ``Series.items`` instead.
diff --git a/python/packaging/classic/setup.py
b/python/packaging/classic/setup.py
index 09f194278cdc..f595b26450e3 100755
--- a/python/packaging/classic/setup.py
+++ b/python/packaging/classic/setup.py
@@ -152,7 +152,7 @@ if in_spark:
# python/packaging/connect/setup.py
_minimum_pandas_version = "2.0.0"
_minimum_numpy_version = "1.21"
-_minimum_pyarrow_version = "10.0.0"
+_minimum_pyarrow_version = "11.0.0"
_minimum_grpc_version = "1.67.0"
_minimum_googleapis_common_protos_version = "1.65.0"
diff --git a/python/packaging/connect/setup.py
b/python/packaging/connect/setup.py
index 5f67e5306b3f..51d0a4c9e360 100755
--- a/python/packaging/connect/setup.py
+++ b/python/packaging/connect/setup.py
@@ -132,7 +132,7 @@ try:
# python/packaging/classic/setup.py
_minimum_pandas_version = "2.0.0"
_minimum_numpy_version = "1.21"
- _minimum_pyarrow_version = "10.0.0"
+ _minimum_pyarrow_version = "11.0.0"
_minimum_grpc_version = "1.59.3"
_minimum_googleapis_common_protos_version = "1.56.4"
diff --git a/python/pyspark/sql/pandas/utils.py
b/python/pyspark/sql/pandas/utils.py
index 5849ae0edd6d..a351c13ff0a0 100644
--- a/python/pyspark/sql/pandas/utils.py
+++ b/python/pyspark/sql/pandas/utils.py
@@ -61,7 +61,7 @@ def require_minimum_pandas_version() -> None:
def require_minimum_pyarrow_version() -> None:
"""Raise ImportError if minimum version of pyarrow is not installed"""
# TODO(HyukjinKwon): Relocate and deduplicate the version specification.
- minimum_pyarrow_version = "10.0.0"
+ minimum_pyarrow_version = "11.0.0"
import os
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]