Pete Robbins created SPARK-10431:
------------------------------------

             Summary: Intermittent test failure in InputOutputMetricsSuite
                 Key: SPARK-10431
                 URL: https://issues.apache.org/jira/browse/SPARK-10431
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.5.0
            Reporter: Pete Robbins
            Priority: Minor


I sometimes get test failures such as:

- input metrics with cache and coalesce *** FAILED ***
  5994472 did not equal 6044472 (InputOutputMetricsSuite.scala:101)

Tracking this down by adding some debug it seems this is a timing issue in the 
test.

test("input metrics with cache and coalesce") {
    // prime the cache manager
    val rdd = sc.textFile(tmpFilePath, 4).cache()
    rdd.collect()     // <== #1

    val bytesRead = runAndReturnBytesRead {      // <== #2
      rdd.count()
    }
    val bytesRead2 = runAndReturnBytesRead {
      rdd.coalesce(4).count()
    }

    // for count and coelesce, the same bytes should be read.
    assert(bytesRead != 0)
    assert(bytesRead2 == bytesRead) // fails
  }

What is happening is that the runAndReturnBytesRead (#2) function adds a 
SparkListener to monitor TaskEnd events to total the bytes read from eg the 
rdd.count()

In the case where this fails the listener receives a TaskEnd event from earlier 
tasks (eg #1) and this mucks up the totalling. This happens because the 
asynchronous thread processing the event queue and notifying the listeners has 
not processed one of the taskEnd events before the new listener is added so it 
also receives that event.

There is a simple fix to the test to wait for the event queue to be empty 
before adding the new listener and I will submit a pull request for that.

I also notice that a lot of the tests add a listener and as there is no 
removeSparkListener api the number of listeners on the context builds up during 
the running of the suite. This is probably why I see this issue running on slow 
machines.

A wider question may be: should a listener receive events that occurred before 
it was added?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to