[ 
https://issues.apache.org/jira/browse/SPARK-35439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-35439:
--------------------------------
    Description: 
EquivalentExpressions maintains a map of equivalent expressions. It is HashMap 
now so the insertion order is not guaranteed to be preserved later. 
Subexpression elimination relies on retrieving subexpressions from the map. If 
there is child-parent relationships among the subexpressions, we want the child 
expressions come first than parent expressions, so we can replace child 
expressions in parent expressions with subexpression evaluation.

For example, we have two different expressions Add(Literal(1), Literal(2)) and 
Add(Literal(3), add).

Case 1: child subexpr comes first. Replacing HashMap with LinkedHashMap can 
deal with it.

addExprTree(add)
addExprTree(Add(Literal(3), add))
addExprTree(Add(Literal(3), add))

Case 2: parent subexpr comes first. For this case, we need to sort equivalent 
expressions.

addExprTree(Add(Literal(3), add))  => We add `Add(Literal(3), add)` into the 
map first, then add `add` into the map
addExprTree(add)
addExprTree(Add(Literal(3), add))


  was:
EquivalentExpressions maintains a map of equivalent expressions. It is HashMap 
now so the insertion order is not guaranteed to be preserved later. 
Subexpression elimination relies on retrieving subexpressions from the map. If 
there is child-parent relationships among the subexpressions, we want the child 
expressions come first than parent expressions, so we can replace child 
expressions in parent expressions with subexpression evaluation.

Although we add expressions recursively into the map with depth-first approach, 
when we retrieve the map values, it is not guaranteed that the order is 
preserved. We should use LinkedHashMap for this usage.


> Children subexpr should come first than parent subexpr in subexpression 
> elimination
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-35439
>                 URL: https://issues.apache.org/jira/browse/SPARK-35439
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.2, 3.1.1, 3.2.0
>            Reporter: L. C. Hsieh
>            Assignee: Apache Spark
>            Priority: Minor
>
> EquivalentExpressions maintains a map of equivalent expressions. It is 
> HashMap now so the insertion order is not guaranteed to be preserved later. 
> Subexpression elimination relies on retrieving subexpressions from the map. 
> If there is child-parent relationships among the subexpressions, we want the 
> child expressions come first than parent expressions, so we can replace child 
> expressions in parent expressions with subexpression evaluation.
> For example, we have two different expressions Add(Literal(1), Literal(2)) 
> and Add(Literal(3), add).
> Case 1: child subexpr comes first. Replacing HashMap with LinkedHashMap can 
> deal with it.
> addExprTree(add)
> addExprTree(Add(Literal(3), add))
> addExprTree(Add(Literal(3), add))
> Case 2: parent subexpr comes first. For this case, we need to sort equivalent 
> expressions.
> addExprTree(Add(Literal(3), add))  => We add `Add(Literal(3), add)` into the 
> map first, then add `add` into the map
> addExprTree(add)
> addExprTree(Add(Literal(3), add))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to