[spark] branch branch-3.2 updated: [SPARK-38927][TESTS] Skip NumPy/Pandas tests in `test_rdd.py` if not available

dongjoon Sun, 17 Apr 2022 13:01:59 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.2 by this push:
     new 71e3c8e4201 [SPARK-38927][TESTS] Skip NumPy/Pandas tests in 
`test_rdd.py` if not available
71e3c8e4201 is described below

commit 71e3c8e420148197b23d14624a85a998365179fe
Author: William Hyun <[email protected]>
AuthorDate: Sun Apr 17 12:55:44 2022 -0700

    [SPARK-38927][TESTS] Skip NumPy/Pandas tests in `test_rdd.py` if not 
available
    
    ### What changes were proposed in this pull request?
    This PR aims to skip NumPy/Pandas tests in `test_rdd.py` if they are not 
available.
    
    ### Why are the changes needed?
    Currently, the tests that involve NumPy or Pandas are failing because NumPy 
and Pandas are unavailable in underlying Python. The tests should be skipped 
instead instead of showing failure.
    
    **BEFORE**
    ```
    ======================================================================
    ERROR: test_take_on_jrdd_with_large_rows_should_not_cause_deadlock 
(pyspark.tests.test_rdd.RDDTests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File ".../test_rdd.py", line 723, in 
test_take_on_jrdd_with_large_rows_should_not_cause_deadlock
        import numpy as np
    ModuleNotFoundError: No module named 'numpy'
    
    ----------------------------------------------------------------------
    Ran 1 test in 1.990s
    
    FAILED (errors=1)
    ```
    
    **AFTER**
    ```
    Finished test(python3.9): pyspark.tests.test_rdd 
RDDTests.test_take_on_jrdd_with_large_rows_should_not_cause_deadlock (1s) ... 1 
tests were skipped
    Tests passed in 1 seconds
    
    Skipped tests in pyspark.tests.test_rdd 
RDDTests.test_take_on_jrdd_with_large_rows_should_not_cause_deadlock with 
python3.9:
        test_take_on_jrdd_with_large_rows_should_not_cause_deadlock 
(pyspark.tests.test_rdd.RDDTests) ... skipped 'NumPy or Pandas not installed'
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    Pass the CIs.
    
    Closes #36235 from williamhyun/skipnumpy.
    
    Authored-by: William Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit c34140d8d744dc75d130af60080a2a8e25d501b1)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 python/pyspark/tests/test_rdd.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/tests/test_rdd.py b/python/pyspark/tests/test_rdd.py
index 81234a4031a..60c000f6d90 100644
--- a/python/pyspark/tests/test_rdd.py
+++ b/python/pyspark/tests/test_rdd.py
@@ -20,6 +20,7 @@ import os
 import random
 import tempfile
 import time
+import unittest
 from glob import glob
 
 from py4j.protocol import Py4JJavaError
@@ -30,7 +31,8 @@ from pyspark.resource import ExecutorResourceRequests, 
ResourceProfileBuilder,\
 from pyspark.serializers import CloudPickleSerializer, BatchedSerializer, 
PickleSerializer,\
     MarshalSerializer, UTF8Deserializer, NoOpSerializer
 from pyspark.sql import SparkSession
-from pyspark.testing.utils import ReusedPySparkTestCase, SPARK_HOME, QuietTest
+from pyspark.testing.utils import ReusedPySparkTestCase, SPARK_HOME, 
QuietTest, have_numpy
+from pyspark.testing.sqlutils import have_pandas
 
 
 global_func = lambda: "Hi"
@@ -694,6 +696,7 @@ class RDDTests(ReusedPySparkTestCase):
         rdd = self.sc.parallelize(range(1 << 20)).map(lambda x: str(x))
         rdd._jrdd.first()
 
+    @unittest.skipIf(not have_numpy or not have_pandas, "NumPy or Pandas not 
installed")
     def test_take_on_jrdd_with_large_rows_should_not_cause_deadlock(self):
         # Regression test for SPARK-38677.
         #


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.2 updated: [SPARK-38927][TESTS] Skip NumPy/Pandas tests in `test_rdd.py` if not available

Reply via email to