[ 
https://issues.apache.org/jira/browse/SPARK-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728795#comment-14728795
 ] 

Apache Spark commented on SPARK-10431:
--------------------------------------

User 'robbinspg' has created a pull request for this issue:
https://github.com/apache/spark/pull/8582

> Intermittent test failure in InputOutputMetricsSuite
> ----------------------------------------------------
>
>                 Key: SPARK-10431
>                 URL: https://issues.apache.org/jira/browse/SPARK-10431
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.5.0
>            Reporter: Pete Robbins
>            Priority: Minor
>
> I sometimes get test failures such as:
> - input metrics with cache and coalesce *** FAILED ***
>   5994472 did not equal 6044472 (InputOutputMetricsSuite.scala:101)
> Tracking this down by adding some debug it seems this is a timing issue in 
> the test.
> test("input metrics with cache and coalesce") {
>     // prime the cache manager
>     val rdd = sc.textFile(tmpFilePath, 4).cache()
>     rdd.collect()     // <== #1
>     val bytesRead = runAndReturnBytesRead {      // <== #2
>       rdd.count()
>     }
>     val bytesRead2 = runAndReturnBytesRead {
>       rdd.coalesce(4).count()
>     }
>     // for count and coelesce, the same bytes should be read.
>     assert(bytesRead != 0)
>     assert(bytesRead2 == bytesRead) // fails
>   }
> What is happening is that the runAndReturnBytesRead (#2) function adds a 
> SparkListener to monitor TaskEnd events to total the bytes read from eg the 
> rdd.count()
> In the case where this fails the listener receives a TaskEnd event from 
> earlier tasks (eg #1) and this mucks up the totalling. This happens because 
> the asynchronous thread processing the event queue and notifying the 
> listeners has not processed one of the taskEnd events before the new listener 
> is added so it also receives that event.
> There is a simple fix to the test to wait for the event queue to be empty 
> before adding the new listener and I will submit a pull request for that.
> I also notice that a lot of the tests add a listener and as there is no 
> removeSparkListener api the number of listeners on the context builds up 
> during the running of the suite. This is probably why I see this issue 
> running on slow machines.
> A wider question may be: should a listener receive events that occurred 
> before it was added?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to