Hi everyone, It is a either to late or to early for me to think straight so please forgive me if it is something trivial. I am trying to add a test case extending SparkSessionTestCase to pyspark.ml.tests (example patch attached). If test collects data, and there is another TestCase extending extending SparkSessionTestCase executed before it, I get AttributeError due to _jsc being None:
======================================================================
ERROR: test_foo (pyspark.ml.tests.FooTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/spark/python/pyspark/ml/tests.py", line 1258, in test_foo
File "/home/spark/python/pyspark/sql/dataframe.py", line 389, in collect
with SCCallSiteSync(self._sc) as css:
File "/home/spark/python/pyspark/traceback_utils.py", line 72, in __enter__
self._context._jsc.setCallSite(self._call_site)
AttributeError: 'NoneType' object has no attribute 'setCallSite'
----------------------------------------------------------------------
If TestCase is executed alone it seems to work just fine.
Can anyone reproduce this? Is there something obvious I miss here?
--
Best,
Maciej
diff --git a/python/pyspark/ml/tests.py b/python/pyspark/ml/tests.py
index 3524160557..cc6e49d6cf 100755
--- a/python/pyspark/ml/tests.py
+++ b/python/pyspark/ml/tests.py
@@ -1245,6 +1245,17 @@ class ALSTest(SparkSessionTestCase):
self.assertEqual(als.getFinalStorageLevel(), "DISK_ONLY")
self.assertEqual(als._java_obj.getFinalStorageLevel(), "DISK_ONLY")
+ als.fit(df).userFactors.collect()
+
+
+class FooTest(SparkSessionTestCase):
+ def test_foo(self):
+ df = self.spark.createDataFrame(
+ [(0, 0, 4.0), (0, 1, 2.0), (1, 1, 3.0), (1, 2, 4.0), (2, 1, 1.0),
(2, 2, 5.0)],
+ ["user", "item", "rating"])
+ als = ALS().setMaxIter(1).setRank(1)
+ als.fit(df).userFactors.collect()
+
class DefaultValuesTests(PySparkTestCase):
"""
signature.asc
Description: OpenPGP digital signature
