[spark] branch branch-2.4 updated: [SPARK-33217][INFRA][PYTHON][2.4] Set upper bound of Pandas and PyArrow version in GitHub Actions in branch-2.4

gurwls223 Thu, 22 Oct 2020 02:22:58 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new 8b9036f  [SPARK-33217][INFRA][PYTHON][2.4] Set upper bound of Pandas 
and PyArrow version in GitHub Actions in branch-2.4
8b9036f is described below

commit 8b9036fb684d1621452c22115345ddfcda6e07c5
Author: HyukjinKwon <[email protected]>
AuthorDate: Thu Oct 22 18:17:36 2020 +0900

    [SPARK-33217][INFRA][PYTHON][2.4] Set upper bound of Pandas and PyArrow 
version in GitHub Actions in branch-2.4
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to set the upper bound of PyArrow and Pandas versions to 
0.12.0 and 0.24.0 respectively.
    
    
https://github.com/apache/spark/commit/16990f929921b3f784a85f3afbe1a22fbe77d895 
and 
https://github.com/apache/spark/commit/07a9885f2792be1353f4a923d649e90bc431cb38 
were not ported back so it fails the tests.
    
    
https://github.com/apache/spark/commit/16990f929921b3f784a85f3afbe1a22fbe77d895 
contains Arrow dependency upgrade so it cannot be cleanly ported back.
    
    Note that I _think_ these tests were broken from the very first place at 
https://github.com/apache/spark/commit/7c65f7680ffbe2c03e444ec60358cbf912c27d13#diff-bdcc6a2a85f645f62724fe8dafbf0581cb0c1d65f6a76cb2985a9172e31a473c.
 There was one flaky test in ML that stops other tests so SQL and Arrow related 
tests were not shown.
    
    ### Why are the changes needed?
    
    1. Spark 2.4.x already declared that higher versions might not work at 
https://github.com/apache/spark/blob/branch-2.4/docs/sql-pyspark-pandas-with-arrow.md#recommended-pandas-and-pyarrow-versions.
    
    2. We're currently unable to test all combinations (due to the lack of 
resources in GitHub Actions, see SPARK-32264). It should be best to pick one 
combination to test.
    
    3. Just to clarify, Spark 2.4 works with the latest PyArrow and pandas 99% 
correctly. Most of are just test only issues.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No, dev-only.
    
    ### How was this patch tested?
    
    GitHub Actions in this build should test.
    
    Closes #30128 from HyukjinKwon/SPARK-33217.
    
    Lead-authored-by: HyukjinKwon <[email protected]>
    Co-authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: HyukjinKwon <[email protected]>
---
 .github/workflows/build_and_test.yml | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 8f46250..9390248 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -130,16 +130,16 @@ jobs:
       if: contains(matrix.modules, 'pyspark')
       # PyArrow is not supported in PyPy yet, see ARROW-2651.
       run: |
-        python3.6 -m pip install numpy pyarrow pandas scipy xmlrunner
+        python3.6 -m pip install numpy 'pyarrow<0.12.0' 'pandas<0.24.0' scipy 
xmlrunner
         python3.6 -m pip list
-        # PyPy does not have xmlrunner
-        pypy3 -m pip install numpy pandas scipy
+        # PyPy does not have xmlrunner, and pandas<0.24.0 installation fails 
in PyPy3, just skipping.
+        pypy3 -m pip install numpy scipy
         pypy3 -m pip list
     - name: Install Python packages (Python 2.7)
       if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 
'sql') && !contains(matrix.modules, 'sql-'))
       run: |
         # Some tests do not pass in PySpark with PyArrow, for example, 
pyspark.sql.tests.ArrowTests.
-        python2.7 -m pip install numpy pandas scipy xmlrunner
+        python2.7 -m pip install numpy 'pandas<0.24.0' scipy xmlrunner
         python2.7 -m pip list
     # SparkR
     - name: Install R 4.0


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-2.4 updated: [SPARK-33217][INFRA][PYTHON][2.4] Set upper bound of Pandas and PyArrow version in GitHub Actions in branch-2.4

Reply via email to