Wan Kun created SPARK-51512:
-------------------------------

             Summary: Filter out null MapStatus when cleaning up shuffle data 
with ExternalShuffleService.
                 Key: SPARK-51512
                 URL: https://issues.apache.org/jira/browse/SPARK-51512
             Project: Spark
          Issue Type: Task
          Components: Spark Core
    Affects Versions: 4.0.0
            Reporter: Wan Kun


When the application crashes unexpectedly, the registered map statuses may 
contain null values, therefore we should skip these null values when cleaning 
up shuffle data with ExternalShuffleService.



{code:java}
25/03/14 15:41:35 ERROR ContextCleaner: Error cleaning shuffle 4
org.apache.spark.SparkException: Exception thrown in awaitResult:
        at 
org.apache.spark.util.SparkThreadUtils$.awaitResult(SparkThreadUtils.scala:53)
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:342)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:101)
        at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:85)
        at 
org.apache.spark.storage.BlockManagerMaster.removeShuffle(BlockManagerMaster.scala:204)
        at 
org.apache.spark.shuffle.sort.io.LocalDiskShuffleDriverComponents.removeShuffle(LocalDiskShuffleDriverComponents.java:47)
        at 
org.apache.spark.ContextCleaner.doCleanupShuffle(ContextCleaner.scala:241)
        at 
org.apache.spark.ContextCleaner.$anonfun$keepCleaning$3(ContextCleaner.scala:203)
        at 
org.apache.spark.ContextCleaner.$anonfun$keepCleaning$3$adapted(ContextCleaner.scala:196)
        at scala.Option.foreach(Option.scala:437)
        at 
org.apache.spark.ContextCleaner.$anonfun$keepCleaning$1(ContextCleaner.scala:196)
        at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1383)
        at 
org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:190)
        at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:80)
Caused by: java.lang.NullPointerException: Cannot invoke 
"org.apache.spark.scheduler.MapStatus.location()" because "mapStatus" is null
        at 
org.apache.spark.storage.BlockManagerMasterEndpoint.$anonfun$removeShuffle$3(BlockManagerMasterEndpoint.scala:423)
        at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1323)
        at 
org.apache.spark.storage.BlockManagerMasterEndpoint.$anonfun$removeShuffle$2(BlockManagerMasterEndpoint.scala:421)
        at 
org.apache.spark.storage.BlockManagerMasterEndpoint.$anonfun$removeShuffle$2$adapted(BlockManagerMasterEndpoint.scala:420)
        at 
org.apache.spark.ShuffleStatus.$anonfun$withMapStatuses$1(MapOutputTracker.scala:435)
        at 
org.apache.spark.ShuffleStatus.withReadLock(MapOutputTracker.scala:72)
        at 
org.apache.spark.ShuffleStatus.withMapStatuses(MapOutputTracker.scala:435)
        at 
org.apache.spark.storage.BlockManagerMasterEndpoint.$anonfun$removeShuffle$1(BlockManagerMasterEndpoint.scala:420)
        at 
org.apache.spark.storage.BlockManagerMasterEndpoint.$anonfun$removeShuffle$1$adapted(BlockManagerMasterEndpoint.scala:419)
        at scala.Option.foreach(Option.scala:437)
        at 
org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$removeShuffle(BlockManagerMasterEndpoint.scala:419)
        at 
org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:201)
        at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:104)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:216)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
        at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:76)
        at 
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:42)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to