[jira] [Updated] (SPARK-15797) To expose groupingSets for DataFrame

Hyukjin Kwon (JIRA) Mon, 20 May 2019 21:25:35 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon updated SPARK-15797:
---------------------------------
    Labels: bulk-closed  (was: )

> To expose groupingSets for DataFrame
> ------------------------------------
>
>                 Key: SPARK-15797
>                 URL: https://issues.apache.org/jira/browse/SPARK-15797
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 1.5.1
>            Reporter: Priyanka Garg
>            Priority: Major
>              Labels: bulk-closed
>
> Currently, Cube and rollup functions are exposed in data frame but not 
> grouping sets. 
> For eg.
> df.rollup($"department", $"group", $designation).avg() results into 
> a. All combinations of department , group and designations
> b. All combinations of department , group , taking designation as null
> c. All departments , taking groups and designation as null
> d. taking department and group both null ( means aggregating on the complete 
> data)
> On the same lines , there should be a function grouping sets , in which 
> custom groupings can be specified.
> For eg.
> df.groupingSets(($"department", $"group", $"designation"), ($"group") 
> ,($"designation"), () ).avg() 
> This should result into:
> 1. All combinations of department, group and designation
> 2. All values of group taking department and designation as null
> 3. All  values of designation, taking department and group as null.
> 4. Aggregation on complete data i.e. taking designation, group and department 
> as null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-15797) To expose groupingSets for DataFrame

Reply via email to