[
https://issues.apache.org/jira/browse/SPARK-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Or updated SPARK-8707:
-----------------------------
Assignee: Navis
> RDD#toDebugString fails if any cached RDD has invalid partitions
> ----------------------------------------------------------------
>
> Key: SPARK-8707
> URL: https://issues.apache.org/jira/browse/SPARK-8707
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.4.0, 1.4.1
> Reporter: Aaron Davidson
> Assignee: Navis
> Labels: starter
> Fix For: 1.6.0
>
>
> Repro:
> {code}
> sc.textFile("/ThisFileDoesNotExist").cache()
> sc.parallelize(0 until 100).toDebugString
> {code}
> Output:
> {code}
> java.io.IOException: Not a file: /ThisFileDoesNotExist
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:215)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
> at
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
> at org.apache.spark.storage.RDDInfo$.fromRdd(RDDInfo.scala:59)
> at
> org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:1455)
> at
> org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:1455)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
> scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at
> org.apache.spark.SparkContext.getRDDStorageInfo(SparkContext.scala:1455)
> at org.apache.spark.rdd.RDD.debugSelf$1(RDD.scala:1573)
> at org.apache.spark.rdd.RDD.firstDebugString$1(RDD.scala:1607)
> at org.apache.spark.rdd.RDD.toDebugString(RDD.scala:1637
> {code}
> This is because toDebugString gets all the partitions from all RDDs, which
> fails (via SparkContext#getRDDStorageInfo). This pathway should definitely be
> resilient to other RDDs being invalid (and getRDDStorageInfo should probably
> also be).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]