This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push:
new 8b9036f [SPARK-33217][INFRA][PYTHON][2.4] Set upper bound of Pandas
and PyArrow version in GitHub Actions in branch-2.4
8b9036f is described below
commit 8b9036fb684d1621452c22115345ddfcda6e07c5
Author: HyukjinKwon <[email protected]>
AuthorDate: Thu Oct 22 18:17:36 2020 +0900
[SPARK-33217][INFRA][PYTHON][2.4] Set upper bound of Pandas and PyArrow
version in GitHub Actions in branch-2.4
### What changes were proposed in this pull request?
This PR proposes to set the upper bound of PyArrow and Pandas versions to
0.12.0 and 0.24.0 respectively.
https://github.com/apache/spark/commit/16990f929921b3f784a85f3afbe1a22fbe77d895
and
https://github.com/apache/spark/commit/07a9885f2792be1353f4a923d649e90bc431cb38
were not ported back so it fails the tests.
https://github.com/apache/spark/commit/16990f929921b3f784a85f3afbe1a22fbe77d895
contains Arrow dependency upgrade so it cannot be cleanly ported back.
Note that I _think_ these tests were broken from the very first place at
https://github.com/apache/spark/commit/7c65f7680ffbe2c03e444ec60358cbf912c27d13#diff-bdcc6a2a85f645f62724fe8dafbf0581cb0c1d65f6a76cb2985a9172e31a473c.
There was one flaky test in ML that stops other tests so SQL and Arrow related
tests were not shown.
### Why are the changes needed?
1. Spark 2.4.x already declared that higher versions might not work at
https://github.com/apache/spark/blob/branch-2.4/docs/sql-pyspark-pandas-with-arrow.md#recommended-pandas-and-pyarrow-versions.
2. We're currently unable to test all combinations (due to the lack of
resources in GitHub Actions, see SPARK-32264). It should be best to pick one
combination to test.
3. Just to clarify, Spark 2.4 works with the latest PyArrow and pandas 99%
correctly. Most of are just test only issues.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
GitHub Actions in this build should test.
Closes #30128 from HyukjinKwon/SPARK-33217.
Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
---
.github/workflows/build_and_test.yml | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/.github/workflows/build_and_test.yml
b/.github/workflows/build_and_test.yml
index 8f46250..9390248 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -130,16 +130,16 @@ jobs:
if: contains(matrix.modules, 'pyspark')
# PyArrow is not supported in PyPy yet, see ARROW-2651.
run: |
- python3.6 -m pip install numpy pyarrow pandas scipy xmlrunner
+ python3.6 -m pip install numpy 'pyarrow<0.12.0' 'pandas<0.24.0' scipy
xmlrunner
python3.6 -m pip list
- # PyPy does not have xmlrunner
- pypy3 -m pip install numpy pandas scipy
+ # PyPy does not have xmlrunner, and pandas<0.24.0 installation fails
in PyPy3, just skipping.
+ pypy3 -m pip install numpy scipy
pypy3 -m pip list
- name: Install Python packages (Python 2.7)
if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules,
'sql') && !contains(matrix.modules, 'sql-'))
run: |
# Some tests do not pass in PySpark with PyArrow, for example,
pyspark.sql.tests.ArrowTests.
- python2.7 -m pip install numpy pandas scipy xmlrunner
+ python2.7 -m pip install numpy 'pandas<0.24.0' scipy xmlrunner
python2.7 -m pip list
# SparkR
- name: Install R 4.0
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]