Ali Afroozeh created SPARK-32755:
------------------------------------

             Summary: Maintain the order of expressions in AttributeSet and 
ExpressionSet 
                 Key: SPARK-32755
                 URL: https://issues.apache.org/jira/browse/SPARK-32755
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.1.0
            Reporter: Ali Afroozeh


Expressions identity is based on the ExprId which is an auto-incremented 
number. This means that the same query can yield a query plan with different 
expression ids in different runs. AttributeSet and ExpressionSet internally use 
a HashSet as the underlying data structure, and therefore cannot guarantee the 
a fixed order of operations in different runs. This can be problematic in cases 
we like to check for plan changes in different runs.

We change do the following changes to AttributeSet and ExpressionSet to 
maintain the insertion order of the elements:
 * We change the underlying data structure of AttributeSet from HashSet to 
LinkedHashSet to maintain the insertion order.
 * ExpressionSet already uses a list to keep track of the expressions, however, 
since it is extending Scala's immutable.Set class, operations such as map and 
flatMap are delegated to the immutable.Set itself. This means that the result 
of these operations is not an instance of ExpressionSet anymore, rather it's a 
implementation picked up by the parent class. We also remove this inheritance 
from immutable.Set and implement the needed methods directly. ExpressionSet has 
a very specific semantics and it does not make sense to extend immutable.Set 
anyway.
 * We change the PlanStabilitySuite to not sort the attributes, to be able to 
catch changes in the order of expressions in different runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to