[ https://issues.apache.org/jira/browse/SPARK-40082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Penglei Shi updated SPARK-40082: -------------------------------- Attachment: missParentStages.png > DAGScheduler may not schduler new stage in condition of push-based shuffle > enabled > ---------------------------------------------------------------------------------- > > Key: SPARK-40082 > URL: https://issues.apache.org/jira/browse/SPARK-40082 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 3.1.1 > Reporter: Penglei Shi > Priority: Major > Attachments: missParentStages.png, shuffleMergeFinalized.png, > submitMissingTasks.png > > > In condition of push-based shuffle being enabled and speculative tasks > existing, a shuffleMapStage will be resubmitting once fetchFailed occurring, > then its parent stages will be resubmitting firstly and it will cost some > time to compute. Before the shuffleMapStage being resubmitting, its all > speculative tasks success and register map output, but task successful events > can not trigger shuffleMergeFinalized because this stage has been remove > from runningStages > !image-2022-08-15-17-17-08-666.png! > Then this stage is resubmitted, but speculative tasks have registered map > output and there are no missing tasks to compute, resubmitting stages will > also not trigger shuffleMergeFinalized. Eventually this stage‘s > _shuffleMergedFinalized keeps false. > !image-2022-08-15-17-17-49-488.png! > Then AQE will submit next stages which are dependent on this shuffleMapStage > occurring fetchFailed. And in getMissingParentStages, this stage will be > marked as missing and will being resubmitted, but next stages are added after > this stage being finished, so next stages will not be submitted even though > this stage's resubmitting has been finished. > !image-2022-08-15-17-15-39-992.png! > > I have only met some times in my production env and it is difficult to > reproduce。 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org