GROUPING SET operators for advanced aggregations

jayapriya surendran (JIRA) Thu, 26 Mar 2015 13:45:30 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378388#comment-14378388
 ]


jayapriya surendran edited comment on PHOENIX-1772 at 3/26/15 8:43 PM:
-----------------------------------------------------------------------

Hello everyone,
I'm Jayapriya Surendran, currently pursuing MS in Computer Engineering at San 
Jose State University. I would like to implement this proposed idea in Phoenix 
as part of Google Summer of Code 2015. I have used these operators as a part of 
Data Mining and Data Warehousing course and found it really useful for many 
advanced aggregation use-cases. I am familiar with the concepts of these 
operations although I am new to Phoenix codebase.I'm really interested in 
distributed systems and I've learnt the basics of Hadoop and MapReduce from 
this Udacity course (https://www.udacity.com/course/ud617) offered by Cloudera. 
I've familiarized myself with Java,JUnit,Maven,IntelliJ and Git while 
implementing algorithms (https://github.com/jayapriya90/algorithms).  With 
mentorship from Phoenix committers, I believe I'll be able to complete these 
features in GSoC timeframe.

Thanks and Regards
Jayapriya Surendran


was (Author: jayapriya90):
Hello everyone,
I'm Jayapriya Surendran, currently pursuing MS in Computer Engineering at San 
Jose State University. I would like to implement this proposed idea in Phoenix 
as part of Google Summer of Code 2015. I have used these operators as a part of 
Data Mining and Data Warehousing course and found it really useful for many 
advanced aggregation use-cases. I am familiar with the concepts of these 
operations although I am new to Phoenix codebase. With mentorship from Phoenix 
committers, I believe I'll be able to complete these features in GSoC timeframe.

Thanks and Regards
Jayapriya Surendran

> Add CUBE/ROLLUP/GROUPING SET operators for advanced aggregations
> ----------------------------------------------------------------
>
>                 Key: PHOENIX-1772
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1772
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: jayapriya surendran
>            Assignee: jayapriya surendran
>              Labels: gsoc2015, java, sql
>
> I noticed from Phoenix language documentation ( 
> http://phoenix.apache.org/language/index.html ) that Phoenix is missing 
> CUBE/ROLLUP and GROUPING_SET operators which are already supported by other 
> similar projects like Apache Pig and Apache Hive. Here is brief overview of 
> my proposal (the syntax that is proposed below is same as PostgreSQL 
> https://wiki.postgresql.org/wiki/Grouping_Sets)
> *Proposed syntax for CUBE:*
> SELECT name, place, SUM(count) FROM cars GROUP BY CUBE(name, place);
> For every row that we process we need to emit 2^n combinations of rows where 
> n corresponds to number of aggregate columns. For the above example query, 
> for every row we need to emit 4 rows, one for each level of aggregations 
> {(name, place), (name, *), (*, place), (*, *)}.
> *Proposed syntax for ROLLUP:*
> SELECT name, place, SUM(count) FROM cars GROUP BY ROLLUP(name, place);
> For every row that we process we need to emit n+1 combinations of rows where 
> n corresponds to number of aggregate columns. For the above example query, 
> for every row we need to emit 3 rows, one for each hierarchical level of 
> aggregations {(name, place), (name, *), (*, *)}.
> *Propose syntax for GROUPING_SETS:*
> SELECT name, place, SUM(count) FROM cars GROUP BY GROUPING SETS(name, ());
> For every row that we process we need to emit n combinations of rows where n 
> corresponds to size of grouping set. For the above example query, for every 
> row we need to emit 2 rows, one for each specified level of aggregations 
> {(name, *), (*, *)}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-1772) Add CUBE/ROLLUP/GROUPING SET operators for advanced aggregations

Reply via email to