[spark] branch master updated: [SPARK-38927][TESTS] Skip NumPy/Pandas tests in `test_rdd.py` if not available

dongjoon Sun, 17 Apr 2022 12:56:20 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new c34140d8d74 [SPARK-38927][TESTS] Skip NumPy/Pandas tests in 
`test_rdd.py` if not available
c34140d8d74 is described below

commit c34140d8d744dc75d130af60080a2a8e25d501b1
Author: William Hyun <will...@apache.org>
AuthorDate: Sun Apr 17 12:55:44 2022 -0700

    [SPARK-38927][TESTS] Skip NumPy/Pandas tests in `test_rdd.py` if not 
available
    
    ### What changes were proposed in this pull request?
    This PR aims to skip NumPy/Pandas tests in `test_rdd.py` if they are not 
available.
    
    ### Why are the changes needed?
    Currently, the tests that involve NumPy or Pandas are failing because NumPy 
and Pandas are unavailable in underlying Python. The tests should be skipped 
instead instead of showing failure.
    
    **BEFORE**
    ```
    ======================================================================
    ERROR: test_take_on_jrdd_with_large_rows_should_not_cause_deadlock 
(pyspark.tests.test_rdd.RDDTests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File ".../test_rdd.py", line 723, in 
test_take_on_jrdd_with_large_rows_should_not_cause_deadlock
        import numpy as np
    ModuleNotFoundError: No module named 'numpy'
    
    ----------------------------------------------------------------------
    Ran 1 test in 1.990s
    
    FAILED (errors=1)
    ```
    
    **AFTER**
    ```
    Finished test(python3.9): pyspark.tests.test_rdd 
RDDTests.test_take_on_jrdd_with_large_rows_should_not_cause_deadlock (1s) ... 1 
tests were skipped
    Tests passed in 1 seconds
    
    Skipped tests in pyspark.tests.test_rdd 
RDDTests.test_take_on_jrdd_with_large_rows_should_not_cause_deadlock with 
python3.9:
        test_take_on_jrdd_with_large_rows_should_not_cause_deadlock 
(pyspark.tests.test_rdd.RDDTests) ... skipped 'NumPy or Pandas not installed'
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    Pass the CIs.
    
    Closes #36235 from williamhyun/skipnumpy.
    
    Authored-by: William Hyun <will...@apache.org>
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
---
 python/pyspark/tests/test_rdd.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/tests/test_rdd.py b/python/pyspark/tests/test_rdd.py
index d5d6cdbae8a..23e41d6c036 100644
--- a/python/pyspark/tests/test_rdd.py
+++ b/python/pyspark/tests/test_rdd.py
@@ -20,6 +20,7 @@ import os
 import random
 import tempfile
 import time
+import unittest
 from glob import glob
 
 from py4j.protocol import Py4JJavaError
@@ -35,7 +36,8 @@ from pyspark.serializers import (
     NoOpSerializer,
 )
 from pyspark.sql import SparkSession
-from pyspark.testing.utils import ReusedPySparkTestCase, SPARK_HOME, QuietTest
+from pyspark.testing.utils import ReusedPySparkTestCase, SPARK_HOME, 
QuietTest, have_numpy
+from pyspark.testing.sqlutils import have_pandas
 
 
 global_func = lambda: "Hi"  # noqa: E731
@@ -698,6 +700,7 @@ class RDDTests(ReusedPySparkTestCase):
         rdd = self.sc.parallelize(range(1 << 20)).map(lambda x: str(x))
         rdd._jrdd.first()
 
+    @unittest.skipIf(not have_numpy or not have_pandas, "NumPy or Pandas not 
installed")
     def test_take_on_jrdd_with_large_rows_should_not_cause_deadlock(self):
         # Regression test for SPARK-38677.
         #


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38927][TESTS] Skip NumPy/Pandas tests in `test_rdd.py` if not available

Reply via email to