[ 
https://issues.apache.org/jira/browse/SPARK-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069278#comment-14069278
 ] 

Masayoshi TSUZUKI commented on SPARK-2567:
------------------------------------------

in submitMissingTasks of DAGScheduler.scala:
{code:title=DAGScheduler.scala}
    ...
    listenerBus.post(SparkListenerStageSubmitted(stageToInfos(stage), 
properties))

    if (tasks.size > 0) {
      ...
      try {
        SparkEnv.get.closureSerializer.newInstance().serialize(tasks.head)
      } catch {
        case e: NotSerializableException =>
          abortStage(stage, "Task not serializable: " + e.toString)
          runningStages -= stage
          return
        case NonFatal(e) => // Other exceptions, such as 
IllegalArgumentException from Kryo.
          abortStage(stage, s"Task serialization failed: 
$e\n${e.getStackTraceString}")
          runningStages -= stage
          return
      }

      logInfo("Submitting " + tasks.size + " missing tasks from " + stage + " 
(" + stage.rdd + ")")
      myPending ++= tasks
      logDebug("New pending tasks: " + myPending)
      taskScheduler.submitTasks(
        new TaskSet(tasks.toArray, stage.id, stage.newAttemptId(), stage.jobId, 
properties))
      stageToInfos(stage).submissionTime = Some(clock.getTime())
    } else {
      logDebug("Stage " + stage + " is actually done; %b %d %d".format(
        stage.isAvailable, stage.numAvailableOutputs, stage.numPartitions))
      runningStages -= stage
    }
{code}
SparkListenerStageSubmitted is posted before the check if
* stage has tasks to be run
* tasks are serializable

If the stage doesn't pass this check, this TaskSet is not submitted.
As a result, the corresponding SparkListenerStageCompleted will never be posted.

So I think SparkListenerStageSubmitted should be posted after the check.


> Resubmitted stage sometimes remains as active stage in the web UI
> -----------------------------------------------------------------
>
>                 Key: SPARK-2567
>                 URL: https://issues.apache.org/jira/browse/SPARK-2567
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Masayoshi TSUZUKI
>         Attachments: SPARK-2567.png
>
>
> When a stage is resubmitted because of executor lost for example, sometimes 
> more than one resubmitted task appears in the web UI and one stage remains as 
> active even after the job has finished.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to