Yin Huai created SPARK-10167:
--------------------------------
Summary: We need to explicitly use transformDown when rewrite
aggregation results
Key: SPARK-10167
URL: https://issues.apache.org/jira/browse/SPARK-10167
Project: Spark
Issue Type: Sub-task
Components: SQL
Reporter: Yin Huai
Priority: Minor
Right now, we use transformDown explicitly at
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala#L105
and
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala#L130.
We also need to be very clear on using transformDown at
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala#L300
and
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala#L334
(right now transform means transformDown). The reason we need to use
transformDown is when we rewrite final aggregate results, we should always
match aggregate functions first. If we use transformUp, it is possible that we
match grouping expression first if we use grouping expressions as children of
aggregate functions.
There is nothing wrong with our master. We just want to make sure we will not
have bugs if we change the behavior of transform (change it from transformDown
to Up.), which I think is very unlikely (but just incase).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]