[ 
https://issues.apache.org/jira/browse/SPARK-10554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740883#comment-14740883
 ] 

Nithin Asokan commented on SPARK-10554:
---------------------------------------

I took a closer look at the logs, and I think my initial suggestion may not 
work well. With a null check we can get rid of the NPE, if {{blockManagerId}} 
is null I think we will not get to the block of code that delete folders, as a 
result we may leave some orphan folders.

Here are some logs that I noticed when spark-shell starts

{code}
15/09/11 09:04:01 INFO DiskBlockManager: Created local directory at 
/tmp/spark-886a9094-a496-409c-9d20-4667e768a05c/blockmgr-9e87c7d5-8614-470a-8800-9b335f305cef
15/09/11 09:04:01 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
15/09/11 09:04:01 INFO HttpFileServer: HTTP File server directory is 
/tmp/spark-ee20d914-ba59-4d7c-a93f-31786f349f82/httpd-a3831baf-5a71-4693-b94f-38de2c1c3b61
{code}

I think we probably need to cleanup these orphan folders; I'm fairly new to 
spark and scala, so please suggest a possible approach for this? Is 
{{blockManager.blockManagerId.isDriver}} really needed? Can we assume that we 
need to delete folders anytime the shutdown hook is invoked? 

> Potential NPE with ShutdownHook
> -------------------------------
>
>                 Key: SPARK-10554
>                 URL: https://issues.apache.org/jira/browse/SPARK-10554
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager
>    Affects Versions: 1.5.0
>            Reporter: Nithin Asokan
>            Priority: Minor
>
> Originally posted in user mailing list 
> [here|http://apache-spark-user-list.1001560.n3.nabble.com/Potential-NPE-while-exiting-spark-shell-tt24523.html]
> I'm currently using Spark 1.3.0 on yarn cluster deployed through CDH5.4. My 
> cluster does not have a 'default' queue, and launching 'spark-shell' submits 
> an yarn application that gets killed immediately because queue does not 
> exist. However, the spark-shell session is still in progress after throwing a 
> bunch of errors while creating sql context. Upon submitting an 'exit' 
> command, there appears to be a NPE from DiskBlockManager with the following 
> stack trace 
> {code}
> ERROR Utils: Uncaught exception in thread delete Spark local dirs 
> java.lang.NullPointerException 
>         at 
> org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:161)
>  
>         at 
> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:141)
>  
>         at 
> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139)
>  
>         at 
> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139)
>  
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617) 
>         at 
> org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139)
>  
> Exception in thread "delete Spark local dirs" java.lang.NullPointerException 
>         at 
> org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:161)
>  
>         at 
> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:141)
>  
>         at 
> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139)
>  
>         at 
> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139)
>  
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617) 
>         at 
> org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139)
>  
> {code}
> I believe the problem appears to be surfacing from a shutdown hook that's 
> tries to cleanup local directories. In this specific case because the yarn 
> application was not submitted successfully, the block manager was not 
> registered; as a result it does not have a valid blockManagerId as seen here 
> https://github.com/apache/spark/blob/v1.3.0/core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala#L161
> Has anyone faced this issue before? Could this be a problem with the way 
> shutdown hook behaves currently? 
> Note: I referenced source from apache spark repo than cloudera.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to