Hi all, working on SPARK-24051 I realized that currently in the Optimizer and in all the places where we are transforming a query plan, we are lacking the context information of what is in scope and what is not.
Coming back to the ticket, the bug reported in the ticket is caused mainly by two reasons: 1 - we have two aliases in different places of the plan; 2 - (the focus of this email) we apply all the rules globally over the whole plan, without any notion of scope where something is reachable/visible or not. I will start with an easy example to explain what I mean. If we have a simple query like: select a, b from ( select 1 as a, 2 as b from table1 union select 3 as a, 4 as b from table2) q We produce a tree which is logically something like: Project0(a, b) - Union -- Project1 (a, b) --- ScanTable1 -- Project 2(a, b) --- ScanTable2 So when we apply a transformation on Project1 for instance, we have no information about what is coming from ScanTable1 (or in general any node which is part of the subtree whose root is Project1): we miss a stateful transform which allows the children to tell the parent, grandparents, and so on what is in their scope. This is in particular true for the Attributes: in a node we have no idea if an Attribute comes from its subtree (it is in scope) or not. So, the point of this email is: do you think in general might be useful to introduce a way of navigating the tree which allows the children to keep a state to be used by their parents? Or do you think it is useful in general to introduce the concept of scope (if an attribute can be accessed by a node of a plan)? Thanks, Marco