[jira] [Assigned] (SPARK-22227) DiskBlockManager.getAllBlocks could fail if called during shuffle

Apache Spark (JIRA) Mon, 09 Oct 2017 09:58:14 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-22227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-22227:
------------------------------------

    Assignee: Apache Spark

> DiskBlockManager.getAllBlocks could fail if called during shuffle
> -----------------------------------------------------------------
>
>                 Key: SPARK-22227
>                 URL: https://issues.apache.org/jira/browse/SPARK-22227
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager
>    Affects Versions: 2.2.0
>            Reporter: Sergei Lebedev
>            Assignee: Apache Spark
>            Priority: Minor
>
> {{DiskBlockManager.getAllBlocks}} assumes that the directories managed by the 
> block manager only contains files corresponding to "valid" block IDs, i.e. 
> those parsable via {{BlockId.apply}}. This is not always the case as 
> demonstrated by the following snippet
> {code}
> object GetAllBlocksFailure {
>   def main(args: Array[String]): Unit = {
>     val sc = new SparkContext(new SparkConf()
>         .setMaster("local[*]")
>         .setAppName("demo"))
>     new Thread {
>       override def run(): Unit = {
>         while (true) {
>           
> println(SparkEnv.get.blockManager.diskBlockManager.getAllBlocks().length)
>           Thread.sleep(10)
>         }
>       }
>     }.start()
>     val rdd = sc.range(1, 65536, numSlices = 10)
>         .map(x => (x % 4096, x))
>         .persist(StorageLevel.DISK_ONLY)
>         .reduceByKey { _ + _ }
>         .collect()
>   }
> }
> {code}
> We have a thread computing the number of bytes occupied by the block manager 
> on-disk and it frequently crashes due to this assumption being violated. 
> Relevant part of the stacktrace
> {code}
> 2017-10-06 11:20:14,287 ERROR  
> org.apache.spark.util.SparkUncaughtExceptionHandler: Uncaught exception in 
> thread Thread[CoarseGrainedExecutorBackend-stop-executor,5,main]
> java.lang.IllegalStateException: Unrecognized BlockId: 
> shuffle_1_2466_0.data.5684dd9e-9fa2-42f5-9dd2-051474e372be
>         at org.apache.spark.storage.BlockId$.apply(BlockId.scala:133)
>         at 
> org.apache.spark.storage.DiskBlockManager$$anonfun$getAllBlocks$1.apply(DiskBlockManager.scala:103)
>         at 
> org.apache.spark.storage.DiskBlockManager$$anonfun$getAllBlocks$1.apply(DiskBlockManager.scala:103)
>         at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>         at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>         at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:73)
>         at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>         at 
> org.apache.spark.storage.DiskBlockManager.getAllBlocks(DiskBlockManager.scala:103)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-22227) DiskBlockManager.getAllBlocks could fail if called during shuffle

Reply via email to