Aaron Staple created SPARK-2581:
-----------------------------------
Summary: complete or withdraw visitedStages optimization in
DAGScheduler’s stageDependsOn
Key: SPARK-2581
URL: https://issues.apache.org/jira/browse/SPARK-2581
Project: Spark
Issue Type: Improvement
Components: Spark Core
Reporter: Aaron Staple
Priority: Minor
Right now the visitedStages HashSet is populated with stages, but never queried
to limit examination of previously visited stages. It may make sense to check
whether a mapStage has been visited previously before visiting it again, as in
the nearby visitedRdds check. Or it may be that the existing visitedRdds check
sufficiently optimizes this function, and visitedStages can simply be removed.
See discussion here:
https://github.com/apache/spark/pull/1362#discussion-diff-15018046L1107
--
This message was sent by Atlassian JIRA
(v6.2#6252)