Reviving this thread to ask whether any of the Spark maintainers would consider helping to scope a solution for this. Michal outlines the problem in this thread, but to clarify. The issue is that for very complex spark application where the Logical Plans often span many pages, it is extremely hard to figure out how the stages in the Spark UI/RDD operations link to the Logical Plan that generated them.
Now, obviously this is a hard problem to solve given the various optimisations and transformations that go on in between these two stages. However I wanted to raise it as a potential option as I think it would be /extremely/ valuable for Spark users. My two main ideas are either: - To carry a reference to the original plan around when planning/optimising. - To maintain a separate mapping for each planning/optimisation step that maps from source to target. Im thinking along the lines of JavaScript sourcemaps. It would be great to get the opinion of an experienced Spark maintainer on this, given the complexity. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org