[
https://issues.apache.org/jira/browse/SPARK-24981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190153#comment-17190153
]
Victor Wong commented on SPARK-24981:
-------------------------------------
`ERROR Utils: Uncaught exception in thread pool-4-thread-1` is logged by
`org.apache.spark.util.Utils#tryLogNonFatalError`, I think it would not cause
SparkContext#stop throwing exception. I wonder how this PR solves this problem.
```
/** Executes the given block. Log non-fatal errors if any, and only throw
fatal errors */
def tryLogNonFatalError(block: => Unit) {
try {
block
} catch {
case NonFatal(t) =>
logError(s"Uncaught exception in thread
${Thread.currentThread().getName}", t)
}
}
```
> ShutdownHook timeout causes job to fail when succeeded when SparkContext
> stop() not called by user program
> ----------------------------------------------------------------------------------------------------------
>
> Key: SPARK-24981
> URL: https://issues.apache.org/jira/browse/SPARK-24981
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.3.1
> Reporter: Hieu Tri Huynh
> Assignee: Hieu Tri Huynh
> Priority: Minor
> Fix For: 2.4.0
>
>
> When user does not stop the SparkContext at the end of their program,
> ShutdownHookManger will stop the SparkContext. However, each shutdown hook is
> only given 10s to run, it will be interrupted and cancelled after that given
> time. In case stopping spark context takes longer than 10s,
> InterruptedException will be thrown, and the job will fail even though it
> succeeded before. An example of this is shown below.
> I think there are a few ways to fix this, below are the 2 ways that I have
> now:
> 1. After user program finished, we can check if user program stoped
> SparkContext or not. If user didn't stop the SparkContext, we can stop it
> before finishing the userThread. By doing so, SparkContext.stop() can take as
> much time as it needed.
> 2. We can just catch the InterruptedException thrown by ShutdownHookManger
> while we are stopping the SparkContext, and ignoring all the things that we
> haven't stopped inside the SparkContext. Since we are shutting down, I think
> it will be okay to ignore those things.
>
> {code:java}
> 18/07/31 17:11:49 ERROR Utils: Uncaught exception in thread pool-4-thread-1
> java.lang.InterruptedException
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1249)
> at java.lang.Thread.join(Thread.java:1323)
> at
> org.apache.spark.scheduler.AsyncEventQueue.stop(AsyncEventQueue.scala:136)
> at
> org.apache.spark.scheduler.LiveListenerBus$$anonfun$stop$1.apply(LiveListenerBus.scala:219)
> at
> org.apache.spark.scheduler.LiveListenerBus$$anonfun$stop$1.apply(LiveListenerBus.scala:219)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at
> org.apache.spark.scheduler.LiveListenerBus.stop(LiveListenerBus.scala:219)
> at
> org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1922)
> at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1360)
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1921)
> at
> org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:573)
> at
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1991)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
> at scala.util.Try$.apply(Try.scala:192)
> at
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 18/07/31 17:11:49 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout,
> java.util.concurrent.TimeoutException
> java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:205)
> at
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]