Github user markhamstra commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/641#discussion_r9997152 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -373,25 +375,26 @@ class DAGScheduler( } else { def removeStage(stageId: Int) { // data structures based on Stage - stageIdToStage.get(stageId).foreach { s => - if (running.contains(s)) { + for (stage <- stageIdToStage.get(stageId)) { + if (running.contains(stage)) { logDebug("Removing running stage %d".format(stageId)) - running -= s + running -= stage + } + stageToInfos -= stage + for (shuffleDep <- stage.shuffleDep) { --- End diff -- At this point, the need is to clean up the DAGScheduler's data structures that reference the removed stage. If you are 100% certain that the stage's notion of shuffleDeps contains every shuffleId that shuffleToMapStage is tracking for the stage, then you can take the simpler route that you have of working from the stage's understanding. I wasn't 100% certain of that (which is not the same thing as saying that I have a good reason to believe that the stage's understanding of shuffleDeps will diverge from shuffleToMapStage's understanding), so I took the safer route of working from what shuffleToMapStage's knows instead of from what the stage knows.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---