[
https://issues.apache.org/jira/browse/SPARK-20205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954348#comment-15954348
]
Mridul Muralidharan commented on SPARK-20205:
---------------------------------------------
bq. I wouldn't say incorrect; at worst it's gonna be slightly inaccurate.
I was referring to the case where we are persisting to event log or consuming
events to externally persist them.
In this context, will we always have unspecified submissionTime or is there
case where submissionTime is pointing to some incorrect/spurious value (if
this is always in the codepath after makeNewStageAttempt; then it should be
fine).
Essentially, is the workaround for existing spark versions to simply set
submissionTime to current time if it is None for SparkListenerStageSubmitted
sufficient ? Will it miss some corner case ? (value is set but is incorrect ?)
> DAGScheduler posts SparkListenerStageSubmitted before updating stage
> --------------------------------------------------------------------
>
> Key: SPARK-20205
> URL: https://issues.apache.org/jira/browse/SPARK-20205
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.2.0
> Reporter: Marcelo Vanzin
>
> Probably affects other versions, haven't checked.
> The code that submits the event to the bus is around line 991:
> {code}
> stage.makeNewStageAttempt(partitionsToCompute.size,
> taskIdToLocations.values.toSeq)
> listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo,
> properties))
> {code}
> Later in the same method, the stage information is updated (around line 1057):
> {code}
> if (tasks.size > 0) {
> logInfo(s"Submitting ${tasks.size} missing tasks from $stage
> (${stage.rdd}) (first 15 " +
> s"tasks are for partitions ${tasks.take(15).map(_.partitionId)})")
> taskScheduler.submitTasks(new TaskSet(
> tasks.toArray, stage.id, stage.latestInfo.attemptId, jobId,
> properties))
> stage.latestInfo.submissionTime = Some(clock.getTimeMillis())
> {code}
> That means an event handler might get a stage submitted event with an unset
> submission time.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]