This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.2 by this push:
new 71e3c8e4201 [SPARK-38927][TESTS] Skip NumPy/Pandas tests in
`test_rdd.py` if not available
71e3c8e4201 is described below
commit 71e3c8e420148197b23d14624a85a998365179fe
Author: William Hyun <[email protected]>
AuthorDate: Sun Apr 17 12:55:44 2022 -0700
[SPARK-38927][TESTS] Skip NumPy/Pandas tests in `test_rdd.py` if not
available
### What changes were proposed in this pull request?
This PR aims to skip NumPy/Pandas tests in `test_rdd.py` if they are not
available.
### Why are the changes needed?
Currently, the tests that involve NumPy or Pandas are failing because NumPy
and Pandas are unavailable in underlying Python. The tests should be skipped
instead instead of showing failure.
**BEFORE**
```
======================================================================
ERROR: test_take_on_jrdd_with_large_rows_should_not_cause_deadlock
(pyspark.tests.test_rdd.RDDTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File ".../test_rdd.py", line 723, in
test_take_on_jrdd_with_large_rows_should_not_cause_deadlock
import numpy as np
ModuleNotFoundError: No module named 'numpy'
----------------------------------------------------------------------
Ran 1 test in 1.990s
FAILED (errors=1)
```
**AFTER**
```
Finished test(python3.9): pyspark.tests.test_rdd
RDDTests.test_take_on_jrdd_with_large_rows_should_not_cause_deadlock (1s) ... 1
tests were skipped
Tests passed in 1 seconds
Skipped tests in pyspark.tests.test_rdd
RDDTests.test_take_on_jrdd_with_large_rows_should_not_cause_deadlock with
python3.9:
test_take_on_jrdd_with_large_rows_should_not_cause_deadlock
(pyspark.tests.test_rdd.RDDTests) ... skipped 'NumPy or Pandas not installed'
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes #36235 from williamhyun/skipnumpy.
Authored-by: William Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit c34140d8d744dc75d130af60080a2a8e25d501b1)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
python/pyspark/tests/test_rdd.py | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/python/pyspark/tests/test_rdd.py b/python/pyspark/tests/test_rdd.py
index 81234a4031a..60c000f6d90 100644
--- a/python/pyspark/tests/test_rdd.py
+++ b/python/pyspark/tests/test_rdd.py
@@ -20,6 +20,7 @@ import os
import random
import tempfile
import time
+import unittest
from glob import glob
from py4j.protocol import Py4JJavaError
@@ -30,7 +31,8 @@ from pyspark.resource import ExecutorResourceRequests,
ResourceProfileBuilder,\
from pyspark.serializers import CloudPickleSerializer, BatchedSerializer,
PickleSerializer,\
MarshalSerializer, UTF8Deserializer, NoOpSerializer
from pyspark.sql import SparkSession
-from pyspark.testing.utils import ReusedPySparkTestCase, SPARK_HOME, QuietTest
+from pyspark.testing.utils import ReusedPySparkTestCase, SPARK_HOME,
QuietTest, have_numpy
+from pyspark.testing.sqlutils import have_pandas
global_func = lambda: "Hi"
@@ -694,6 +696,7 @@ class RDDTests(ReusedPySparkTestCase):
rdd = self.sc.parallelize(range(1 << 20)).map(lambda x: str(x))
rdd._jrdd.first()
+ @unittest.skipIf(not have_numpy or not have_pandas, "NumPy or Pandas not
installed")
def test_take_on_jrdd_with_large_rows_should_not_cause_deadlock(self):
# Regression test for SPARK-38677.
#
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]