Marcelo Vanzin created SPARK-20342:
--------------------------------------
Summary: DAGScheduler sends SparkListenerTaskEnd before updating
task's accumulators
Key: SPARK-20342
URL: https://issues.apache.org/jira/browse/SPARK-20342
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.2.0
Reporter: Marcelo Vanzin
Hit this on 2.2, but probably has been there forever. This is similar in spirit
to SPARK-20205.
Event is sent here, around L1154:
{code}
listenerBus.post(SparkListenerTaskEnd(
stageId, task.stageAttemptId, taskType, event.reason, event.taskInfo,
taskMetrics))
{code}
Accumulators are updated later, around L1173:
{code}
val stage = stageIdToStage(task.stageId)
event.reason match {
case Success =>
task match {
case rt: ResultTask[_, _] =>
// Cast to ResultStage here because it's part of the ResultTask
// TODO Refactor this out to a function that accepts a ResultStage
val resultStage = stage.asInstanceOf[ResultStage]
resultStage.activeJob match {
case Some(job) =>
if (!job.finished(rt.outputId)) {
updateAccumulators(event)
{code}
Same thing applies here; UI shows correct info because it's pointing at the
mutable {{TaskInfo}} structure. But the event log, for example, may record the
wrong information.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]