[ https://issues.apache.org/jira/browse/SPARK-13631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177261#comment-15177261 ]
Andy Sloane commented on SPARK-13631: ------------------------------------- Did some digging with git bisect. It turns out to be directly linked to {{spark.shuffle.reduceLocality.enabled}}. The difference between Spark 1.6 and 1.5 here is that 1.5 has it {{false}} by default, and 1.6 has it {{true}} by default. Setting it to false cures this in 1.6, and setting it to true causes it to re-emerge in 1.5. > getPreferredLocations race condition in spark 1.6.0? > ---------------------------------------------------- > > Key: SPARK-13631 > URL: https://issues.apache.org/jira/browse/SPARK-13631 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 1.6.0 > Reporter: Andy Sloane > > We are seeing something that looks a lot like a regression from spark 1.2. > When we run jobs with multiple threads, we have a crash somewhere inside > getPreferredLocations, as was fixed in SPARK-4454. Except now it's inside > org.apache.spark.MapOutputTrackerMaster.getLocationsWithLargestOutputs > instead of DAGScheduler directly. > I tried Spark 1.2 post-SPARK-4454 (before this patch it's only slightly > flaky), 1.4.1, and 1.5.2 and all are fine. 1.6.0 immediately crashes on our > threaded test case, though once in a while it passes. > The stack trace is huge, but starts like this: > Caused by: java.lang.NullPointerException: null > at > org.apache.spark.MapOutputTrackerMaster.getLocationsWithLargestOutputs(MapOutputTracker.scala:406) > at > org.apache.spark.MapOutputTrackerMaster.getPreferredLocationsForShuffle(MapOutputTracker.scala:366) > at > org.apache.spark.rdd.ShuffledRDD.getPreferredLocations(ShuffledRDD.scala:92) > at > org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:257) > at > org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:257) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:256) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1545) > The full trace is available here: > https://gist.github.com/andy256/97611f19924bbf65cf49 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org