[
https://issues.apache.org/jira/browse/SPARK-10554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740883#comment-14740883
]
Nithin Asokan commented on SPARK-10554:
---------------------------------------
I took a closer look at the logs, and I think my initial suggestion may not
work well. With a null check we can get rid of the NPE, if {{blockManagerId}}
is null I think we will not get to the block of code that delete folders, as a
result we may leave some orphan folders.
Here are some logs that I noticed when spark-shell starts
{code}
15/09/11 09:04:01 INFO DiskBlockManager: Created local directory at
/tmp/spark-886a9094-a496-409c-9d20-4667e768a05c/blockmgr-9e87c7d5-8614-470a-8800-9b335f305cef
15/09/11 09:04:01 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
15/09/11 09:04:01 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-ee20d914-ba59-4d7c-a93f-31786f349f82/httpd-a3831baf-5a71-4693-b94f-38de2c1c3b61
{code}
I think we probably need to cleanup these orphan folders; I'm fairly new to
spark and scala, so please suggest a possible approach for this? Is
{{blockManager.blockManagerId.isDriver}} really needed? Can we assume that we
need to delete folders anytime the shutdown hook is invoked?
> Potential NPE with ShutdownHook
> -------------------------------
>
> Key: SPARK-10554
> URL: https://issues.apache.org/jira/browse/SPARK-10554
> Project: Spark
> Issue Type: Bug
> Components: Block Manager
> Affects Versions: 1.5.0
> Reporter: Nithin Asokan
> Priority: Minor
>
> Originally posted in user mailing list
> [here|http://apache-spark-user-list.1001560.n3.nabble.com/Potential-NPE-while-exiting-spark-shell-tt24523.html]
> I'm currently using Spark 1.3.0 on yarn cluster deployed through CDH5.4. My
> cluster does not have a 'default' queue, and launching 'spark-shell' submits
> an yarn application that gets killed immediately because queue does not
> exist. However, the spark-shell session is still in progress after throwing a
> bunch of errors while creating sql context. Upon submitting an 'exit'
> command, there appears to be a NPE from DiskBlockManager with the following
> stack trace
> {code}
> ERROR Utils: Uncaught exception in thread delete Spark local dirs
> java.lang.NullPointerException
> at
> org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:161)
>
> at
> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:141)
>
> at
> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139)
>
> at
> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139)
>
> at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617)
> at
> org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139)
>
> Exception in thread "delete Spark local dirs" java.lang.NullPointerException
> at
> org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:161)
>
> at
> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:141)
>
> at
> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139)
>
> at
> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139)
>
> at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617)
> at
> org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139)
>
> {code}
> I believe the problem appears to be surfacing from a shutdown hook that's
> tries to cleanup local directories. In this specific case because the yarn
> application was not submitted successfully, the block manager was not
> registered; as a result it does not have a valid blockManagerId as seen here
> https://github.com/apache/spark/blob/v1.3.0/core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala#L161
> Has anyone faced this issue before? Could this be a problem with the way
> shutdown hook behaves currently?
> Note: I referenced source from apache spark repo than cloudera.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]