[
https://issues.apache.org/jira/browse/SPARK-25020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-25020.
----------------------------------
Resolution: Incomplete
> Unable to Perform Graceful Shutdown in Spark Streaming with Hadoop 2.8
> ----------------------------------------------------------------------
>
> Key: SPARK-25020
> URL: https://issues.apache.org/jira/browse/SPARK-25020
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1
> Environment: Spark Streaming
> -- Tested on 2.2 & 2.3 (more than likely affects all versions with graceful
> shutdown)
> Hadoop 2.8
> Reporter: Ricky Saltzer
> Priority: Major
> Labels: bulk-closed
>
> Opening this up to give you guys some insight in an issue that will occur
> when using Spark Streaming with Hadoop 2.8.
> Hadoop 2.8 added HADOOP-12950 which adds a upper limit of a 10 second timeout
> for its shutdown hook. From our tests, if the Spark job takes longer than 10
> seconds to shutdown gracefully, the Hadoop shutdown thread seems to trample
> over the graceful shutdown and throw an exception like
> {code:java}
> 18/08/03 17:21:04 ERROR Utils: Uncaught exception in thread pool-1-thread-1
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at
> org.apache.spark.streaming.scheduler.ReceiverTracker.stop(ReceiverTracker.scala:177)
> at
> org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:114)
> at
> org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:682)
> at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1317)
> at
> org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:681)
> at
> org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:715)
> at
> org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599)
> at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1948)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
> at scala.util.Try$.apply(Try.scala:192)
> at
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){code}
> The reason I hit this issue is because we recently upgraded to EMR 5.15,
> which has both Spark 2.3 & Hadoop 2.8. The following workaround has proven
> successful to us (in limited testing)
> Instead of just running
> {code:java}
> ...
> ssc.start()
> ssc.awaitTermination(){code}
> We needed to do the following
> {code:java}
> ...
> ssc.start()
> sys.ShutdownHookThread {
> ssc.stop(true, true)
> }
> ssc.awaitTermination(){code}
> As far as I can tell, there is no way to override the default {{10 second}}
> timeout in HADOOP-12950, which is why we had to go with the workaround.
> Note: I also verified this bug exists even with EMR 5.12.1 which runs Spark
> 2.2.x & Hadoop 2.8.
> Ricky
> Epic Games
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]