This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push:
new 2e72b01 [SPARK-26646][TEST][PYSPARK][2.4] Fix flaky test:
pyspark.mllib.tests.StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
2e72b01 is described below
commit 2e72b0110c0d962a7997fddb2ef08b6613f3d338
Author: Liang-Chi Hsieh <[email protected]>
AuthorDate: Sat Oct 17 16:31:42 2020 -0700
[SPARK-26646][TEST][PYSPARK][2.4] Fix flaky test:
pyspark.mllib.tests.StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
### What changes were proposed in this pull request?
This is backport of SPARK-26646 to branch-2.4 to fix flaky test in the
branch.
### Why are the changes needed?
The test
pyspark.mllib.tests.StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
looks sometimes flaky.
```
Traceback (most recent call last):
File "/home/runner/work/spark/spark/python/pyspark/mllib/tests.py", line
1492, in test_training_and_prediction
self._eventually(condition, timeout=180.0)
File "/home/runner/work/spark/spark/python/pyspark/mllib/tests.py", line
133, in _eventually
lastValue = condition()
File "/home/runner/work/spark/spark/python/pyspark/mllib/tests.py", line
1487, in condition
self.assertGreater(errors[1] - errors[-1], 0.3)
AssertionError: -0.07000000000000006 not greater than 0.3
```
The predict stream can possibly be consumed to the end before the input
stream. When it happens, the model improvement is not high as expected and
causes test failed. This patch tries to increase number of batches of streams.
This won't increase test time because we have a timeout there.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit test
Closes #30078 from viirya/SPARK-26646-2.4.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
python/pyspark/mllib/tests.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/python/pyspark/mllib/tests.py b/python/pyspark/mllib/tests.py
index ec9497c..a3df358 100644
--- a/python/pyspark/mllib/tests.py
+++ b/python/pyspark/mllib/tests.py
@@ -1459,7 +1459,7 @@ class
StreamingLogisticRegressionWithSGDTests(MLLibStreamingTestCase):
"""Test that the model improves on toy data with no. of batches"""
input_batches = [
self.sc.parallelize(self.generateLogisticInput(0, 1.5, 100, 42 +
i))
- for i in range(20)]
+ for i in range(40)]
predict_batches = [
b.map(lambda lp: (lp.label, lp.features)) for b in input_batches]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]