[jira] [Commented] (SPARK-9241) Supporting multiple DISTINCT columns

Herman van Hovell (JIRA) Thu, 15 Oct 2015 13:59:43 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959604#comment-14959604
 ]


Herman van Hovell commented on SPARK-9241:
------------------------------------------

It should grow linear (or am I missing something). For example if we have 3 
grouping sets (like in the example), we would duplicate and project the data 3x 
times. It is still bad, but similar to the approach in [~yhuai]'s example 
(saving a join). We could have a problem with the {{GROUPING__ID}} bitmask 
field, only 32/64 fields can be in a grouping set.

> Supporting multiple DISTINCT columns
> ------------------------------------
>
>                 Key: SPARK-9241
>                 URL: https://issues.apache.org/jira/browse/SPARK-9241
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Yin Huai
>            Priority: Critical
>
> Right now the new aggregation code path only support a single distinct column 
> (you can use it in multiple aggregate functions in the query). We need to 
> support multiple distinct columns by generating a different plan for handling 
> multiple distinct columns (without change aggregate functions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9241) Supporting multiple DISTINCT columns

Reply via email to