This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new c34140d8d74 [SPARK-38927][TESTS] Skip NumPy/Pandas tests in `test_rdd.py` if not available c34140d8d74 is described below commit c34140d8d744dc75d130af60080a2a8e25d501b1 Author: William Hyun <will...@apache.org> AuthorDate: Sun Apr 17 12:55:44 2022 -0700 [SPARK-38927][TESTS] Skip NumPy/Pandas tests in `test_rdd.py` if not available ### What changes were proposed in this pull request? This PR aims to skip NumPy/Pandas tests in `test_rdd.py` if they are not available. ### Why are the changes needed? Currently, the tests that involve NumPy or Pandas are failing because NumPy and Pandas are unavailable in underlying Python. The tests should be skipped instead instead of showing failure. **BEFORE** ``` ====================================================================== ERROR: test_take_on_jrdd_with_large_rows_should_not_cause_deadlock (pyspark.tests.test_rdd.RDDTests) ---------------------------------------------------------------------- Traceback (most recent call last): File ".../test_rdd.py", line 723, in test_take_on_jrdd_with_large_rows_should_not_cause_deadlock import numpy as np ModuleNotFoundError: No module named 'numpy' ---------------------------------------------------------------------- Ran 1 test in 1.990s FAILED (errors=1) ``` **AFTER** ``` Finished test(python3.9): pyspark.tests.test_rdd RDDTests.test_take_on_jrdd_with_large_rows_should_not_cause_deadlock (1s) ... 1 tests were skipped Tests passed in 1 seconds Skipped tests in pyspark.tests.test_rdd RDDTests.test_take_on_jrdd_with_large_rows_should_not_cause_deadlock with python3.9: test_take_on_jrdd_with_large_rows_should_not_cause_deadlock (pyspark.tests.test_rdd.RDDTests) ... skipped 'NumPy or Pandas not installed' ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #36235 from williamhyun/skipnumpy. Authored-by: William Hyun <will...@apache.org> Signed-off-by: Dongjoon Hyun <dongj...@apache.org> --- python/pyspark/tests/test_rdd.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/python/pyspark/tests/test_rdd.py b/python/pyspark/tests/test_rdd.py index d5d6cdbae8a..23e41d6c036 100644 --- a/python/pyspark/tests/test_rdd.py +++ b/python/pyspark/tests/test_rdd.py @@ -20,6 +20,7 @@ import os import random import tempfile import time +import unittest from glob import glob from py4j.protocol import Py4JJavaError @@ -35,7 +36,8 @@ from pyspark.serializers import ( NoOpSerializer, ) from pyspark.sql import SparkSession -from pyspark.testing.utils import ReusedPySparkTestCase, SPARK_HOME, QuietTest +from pyspark.testing.utils import ReusedPySparkTestCase, SPARK_HOME, QuietTest, have_numpy +from pyspark.testing.sqlutils import have_pandas global_func = lambda: "Hi" # noqa: E731 @@ -698,6 +700,7 @@ class RDDTests(ReusedPySparkTestCase): rdd = self.sc.parallelize(range(1 << 20)).map(lambda x: str(x)) rdd._jrdd.first() + @unittest.skipIf(not have_numpy or not have_pandas, "NumPy or Pandas not installed") def test_take_on_jrdd_with_large_rows_should_not_cause_deadlock(self): # Regression test for SPARK-38677. # --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org