This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 8503aa3  [SPARK-26646][TEST][PYSPARK] Fix flaky test: 
pyspark.mllib.tests.test_streaming_algorithms 
StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
8503aa3 is described below

commit 8503aa300708fd8367c665e45d317c6ba4214ab2
Author: Liang-Chi Hsieh <vii...@gmail.com>
AuthorDate: Fri Jan 18 23:53:11 2019 +0800

    [SPARK-26646][TEST][PYSPARK] Fix flaky test: 
pyspark.mllib.tests.test_streaming_algorithms 
StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
    
    ## What changes were proposed in this pull request?
    
    The test pyspark.mllib.tests.test_streaming_algorithms 
StreamingLogisticRegressionWithSGDTests.test_training_and_prediction looks 
sometimes flaky.
    
    ```
    ======================================================================
    FAIL: test_training_and_prediction 
(pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
    Test that the model improves on toy data with no. of batches
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 367, in test_training_and_prediction
        self._eventually(condition, timeout=60.0)
      File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 69, in _eventually
        lastValue = condition()
      File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 362, in condition
        self.assertGreater(errors[1] - errors[-1], 0.3)
    AssertionError: -0.070000000000000062 not greater than 0.3
    
    ----------------------------------------------------------------------
    Ran 13 tests in 198.327s
    
    FAILED (failures=1, skipped=1)
    
    Had test failures in pyspark.mllib.tests.test_streaming_algorithms with 
python3.4; see logs
    ```
    
    The predict stream can possibly be consumed to the end before the input 
stream. When it happens, the model improvement is not high as expected and 
causes test failed. This patch tries to increase number of batches of streams. 
This won't increase test time because we have a timeout there.
    
    ## How was this patch tested?
    
    Manually test.
    
    Closes #23586 from viirya/SPARK-26646.
    
    Authored-by: Liang-Chi Hsieh <vii...@gmail.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/pyspark/mllib/tests/test_streaming_algorithms.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/mllib/tests/test_streaming_algorithms.py 
b/python/pyspark/mllib/tests/test_streaming_algorithms.py
index bf2ad2d..cab3010 100644
--- a/python/pyspark/mllib/tests/test_streaming_algorithms.py
+++ b/python/pyspark/mllib/tests/test_streaming_algorithms.py
@@ -334,7 +334,7 @@ class 
StreamingLogisticRegressionWithSGDTests(MLLibStreamingTestCase):
         """Test that the model improves on toy data with no. of batches"""
         input_batches = [
             self.sc.parallelize(self.generateLogisticInput(0, 1.5, 100, 42 + 
i))
-            for i in range(20)]
+            for i in range(40)]
         predict_batches = [
             b.map(lambda lp: (lp.label, lp.features)) for b in input_batches]
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to