[
https://issues.apache.org/jira/browse/SPARK-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728795#comment-14728795
]
Apache Spark commented on SPARK-10431:
--------------------------------------
User 'robbinspg' has created a pull request for this issue:
https://github.com/apache/spark/pull/8582
> Intermittent test failure in InputOutputMetricsSuite
> ----------------------------------------------------
>
> Key: SPARK-10431
> URL: https://issues.apache.org/jira/browse/SPARK-10431
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.5.0
> Reporter: Pete Robbins
> Priority: Minor
>
> I sometimes get test failures such as:
> - input metrics with cache and coalesce *** FAILED ***
> 5994472 did not equal 6044472 (InputOutputMetricsSuite.scala:101)
> Tracking this down by adding some debug it seems this is a timing issue in
> the test.
> test("input metrics with cache and coalesce") {
> // prime the cache manager
> val rdd = sc.textFile(tmpFilePath, 4).cache()
> rdd.collect() // <== #1
> val bytesRead = runAndReturnBytesRead { // <== #2
> rdd.count()
> }
> val bytesRead2 = runAndReturnBytesRead {
> rdd.coalesce(4).count()
> }
> // for count and coelesce, the same bytes should be read.
> assert(bytesRead != 0)
> assert(bytesRead2 == bytesRead) // fails
> }
> What is happening is that the runAndReturnBytesRead (#2) function adds a
> SparkListener to monitor TaskEnd events to total the bytes read from eg the
> rdd.count()
> In the case where this fails the listener receives a TaskEnd event from
> earlier tasks (eg #1) and this mucks up the totalling. This happens because
> the asynchronous thread processing the event queue and notifying the
> listeners has not processed one of the taskEnd events before the new listener
> is added so it also receives that event.
> There is a simple fix to the test to wait for the event queue to be empty
> before adding the new listener and I will submit a pull request for that.
> I also notice that a lot of the tests add a listener and as there is no
> removeSparkListener api the number of listeners on the context builds up
> during the running of the suite. This is probably why I see this issue
> running on slow machines.
> A wider question may be: should a listener receive events that occurred
> before it was added?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]