[
https://issues.apache.org/jira/browse/SPARK-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603001#comment-14603001
]
yuemeng commented on SPARK-8663:
--------------------------------
I think the reason becasue:
1)eventProcessActor ! JobSubmitted(
jobId, rdd, func2, partitions.toArray, allowLocal, callSite, waiter,
properties)
waiter
} //eventProcessActor had dead, and this meassage sent to deadmailbox.so it
will be lost waiter,
2)
def awaitResult(): JobResult = synchronized {
while (!_jobFinished) {
this.wait()
}
return jobResult
} //this will enter loop stituation
> Dirver will be hang if there is a job submit during SparkContex stop Interval
> -----------------------------------------------------------------------------
>
> Key: SPARK-8663
> URL: https://issues.apache.org/jira/browse/SPARK-8663
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.0.0, 1.1.1, 1.2.0
> Environment: SUSE Linux Enterprise Server 11 SP3 (x86_64)
> Reporter: yuemeng
> Fix For: 1.0.0, 1.1.1, 1.2.2
>
>
> Driver process will be hang if a job had submit during sc.stop Interval.This
> interval mean from start stop SparkContext to finish .
> The probability of this situation is very small,but If present, will cause
> driver process never exit.
> Reproduce step:
> 1)modify source code to make SparkContext stop() method sleep 2s
> in my situation,i make DAGScheduler stop method sleep 2s
> 2)submit an application ,code like:
> object DriverThreadTest {
> def main(args: Array[String]) {
> val sconf = new SparkConf().setAppName("TestJobWaitor")
> val sc= new SparkContext(sconf)
> Thread.sleep(5000)
> val t = new Thread {
> override def run() {
> while (true) {
> try {
> val rdd = sc.parallelize( 1 to 1000)
> var i = 0
> println("calcfunc start")
> while ( i < 10){
> i+=1
> rdd.count
> }
> println("calcfunc end")
> }catch{
> case e: Exception =>
> e.printStackTrace()
> }
> }
> }
> }
>
> t.start()
>
> val t2 = new Thread {
> override def run() {
> Thread.sleep(2000)
> println("stop sc thread")
> sc.stop()
> println("sc already stoped")
> }
> }
> t2.start()
> }
> }
> driver will be never exit
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]