Andy Sloane created SPARK-13631:
-----------------------------------

             Summary: getPreferredLocations race condition in spark 1.6.0?
                 Key: SPARK-13631
                 URL: https://issues.apache.org/jira/browse/SPARK-13631
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 1.6.0
            Reporter: Andy Sloane


We are seeing something that looks a lot like a regression from spark 1.2. When 
we run jobs with multiple threads, we have a crash somewhere inside 
getPreferredLocations, as was fixed in SPARK-4454. Except now it's inside 
org.apache.spark.MapOutputTrackerMaster.getLocationsWithLargestOutputs instead 
of DAGScheduler directly.

I tried Spark 1.2 post-SPARK-4454 (before this patch it's only slightly flaky), 
1.4.1, and 1.5.2 and all are fine. 1.6.0 immediately crashes on our threaded 
test case, though once in a while it passes.

The stack trace is huge, but starts like this:

Caused by: java.lang.NullPointerException: null
        at 
org.apache.spark.MapOutputTrackerMaster.getLocationsWithLargestOutputs(MapOutputTracker.scala:406)
        at 
org.apache.spark.MapOutputTrackerMaster.getPreferredLocationsForShuffle(MapOutputTracker.scala:366)
        at 
org.apache.spark.rdd.ShuffledRDD.getPreferredLocations(ShuffledRDD.scala:92)
        at 
org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:257)
        at 
org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:257)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:256)
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1545)

The full trace is available here:
https://gist.github.com/andy256/97611f19924bbf65cf49




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to