[ 
https://issues.apache.org/jira/browse/SPARK-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600381#comment-14600381
 ] 

Josh Rosen commented on SPARK-8606:
-----------------------------------

An example stacktrace exhibiting this bug:

{code}
DAGSchedulerEventProcessLoop: DAGSchedulerEventProcessLoop failed; shutting 
down SparkContext
org.apache.spark.SparkException: Attempted to use BlockRDD[3021] at 
createStream at <console>:69 after its blocks have been removed!
        at org.apache.spark.rdd.BlockRDD.assertValid(BlockRDD.scala:83)
        at 
org.apache.spark.rdd.BlockRDD.getPreferredLocations(BlockRDD.scala:56)
        at 
org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:231)
        at 
org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:231)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:230)
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1380)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1390)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1389)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1389)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1389)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1387)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1387)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1390)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1389)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1389)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1389)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1387)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1387)
        at 
org.apache.spark.scheduler.DAGScheduler.getPreferredLocs(DAGScheduler.scala:1354)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$15.apply(DAGScheduler.scala:892)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$15.apply(DAGScheduler.scala:891)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractTraversable.map(Traversable.scala:105)
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:891)
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:815)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:818)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:817)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:817)
        at 
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:799)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1419)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
{code}

> Exceptions in RDD.getPreferredLocations() and getPartitions() should not be 
> able to crash DAGScheduler
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-8606
>                 URL: https://issues.apache.org/jira/browse/SPARK-8606
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Critical
>
> RDD.getPreferredLocations() and RDD.getPartitions() may throw exceptions but 
> the DAGScheduler does not guard against this, leaving it vulnerable to 
> crashing and stopping the SparkContext if exceptions occur there.
> We should fix this by adding more try blocks around these calls in 
> DAGScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to