Ali Afroozeh created SPARK-32755:
------------------------------------
Summary: Maintain the order of expressions in AttributeSet and
ExpressionSet
Key: SPARK-32755
URL: https://issues.apache.org/jira/browse/SPARK-32755
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.1.0
Reporter: Ali Afroozeh
Expressions identity is based on the ExprId which is an auto-incremented
number. This means that the same query can yield a query plan with different
expression ids in different runs. AttributeSet and ExpressionSet internally use
a HashSet as the underlying data structure, and therefore cannot guarantee the
a fixed order of operations in different runs. This can be problematic in cases
we like to check for plan changes in different runs.
We change do the following changes to AttributeSet and ExpressionSet to
maintain the insertion order of the elements:
* We change the underlying data structure of AttributeSet from HashSet to
LinkedHashSet to maintain the insertion order.
* ExpressionSet already uses a list to keep track of the expressions, however,
since it is extending Scala's immutable.Set class, operations such as map and
flatMap are delegated to the immutable.Set itself. This means that the result
of these operations is not an instance of ExpressionSet anymore, rather it's a
implementation picked up by the parent class. We also remove this inheritance
from immutable.Set and implement the needed methods directly. ExpressionSet has
a very specific semantics and it does not make sense to extend immutable.Set
anyway.
* We change the PlanStabilitySuite to not sort the attributes, to be able to
catch changes in the order of expressions in different runs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]