Andrei Taleanu created SPARK-20323:
--------------------------------------
Summary: Calling stop in a transform stage causes the app to hang
Key: SPARK-20323
URL: https://issues.apache.org/jira/browse/SPARK-20323
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.1.0
Reporter: Andrei Taleanu
I'm not sure if this is a bug or just the way it needs to happen but I've run
in this issue with the following code:
{noformat}
object ImmortalStreamingJob extends App {
val conf = new SparkConf().setAppName("fun-spark").setMaster("local[*]")
val ssc = new StreamingContext(conf, Seconds(1))
val elems = (1 to 1000).grouped(10)
.map(seq => ssc.sparkContext.parallelize(seq))
.toSeq
val stream = ssc.queueStream(mutable.Queue[RDD[Int]](elems: _*))
val transformed = stream.transform { rdd =>
try {
if (Random.nextInt(6) == 5) throw new RuntimeException("boom")
else println("lucky bastard")
rdd
} catch {
case e: Throwable =>
println("stopping streaming context", e)
ssc.stop(stopSparkContext = true, stopGracefully = false)
throw e
}
}
transformed.foreachRDD { rdd =>
println(rdd.collect().mkString(","))
}
ssc.start()
ssc.awaitTermination()
}
{noformat}
There are two things I can note here:
* if the exception is thrown in the first transformation (when the first RDD is
processed), the spark context is stopped and the app dies
* if the exception is thrown after at least one RDD has been processed, the app
hangs after printing the error message and never stops
I think there's some sort of deadlock in the second case, is that normal? I
also asked this
[here|http://stackoverflow.com/questions/43273783/immortal-spark-streaming-job/43373624#43373624]
but up two this point there's no answer pointing exactly to what happens, only
guidelines.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]