[ 
https://issues.apache.org/jira/browse/PHOENIX-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378388#comment-14378388
 ] 

jayapriya surendran edited comment on PHOENIX-1772 at 3/26/15 11:51 PM:
------------------------------------------------------------------------

Hello everyone,
I'm Jayapriya Surendran, currently pursuing MS in Computer Engineering at San 
Jose State University. I would like to implement this proposed idea in Phoenix 
as part of Google Summer of Code 2015. I'm really interested in distributed 
systems and I've learnt the basics of Hadoop and MapReduce from this Udacity 
course (https://www.udacity.com/course/ud617) offered by Cloudera. I've 
familiarized myself with Java,JUnit,Maven,IntelliJ and Git while implementing 
algorithms (https://github.com/jayapriya90/algorithms).I have used these 
operators as a part of Data Mining and Data Warehousing course and found it 
really useful for many advanced aggregation use-cases. I am familiar with the 
concepts of these operations although I am new to Phoenix codebase.  With 
mentorship from Phoenix committers, I believe I'll be able to complete these 
features in GSoC timeframe.

Thanks and Regards
Jayapriya Surendran


was (Author: jayapriya90):
Hello everyone,
I'm Jayapriya Surendran, currently pursuing MS in Computer Engineering at San 
Jose State University. I would like to implement this proposed idea in Phoenix 
as part of Google Summer of Code 2015. I have used these operators as a part of 
Data Mining and Data Warehousing course and found it really useful for many 
advanced aggregation use-cases. I am familiar with the concepts of these 
operations although I am new to Phoenix codebase.I'm really interested in 
distributed systems and I've learnt the basics of Hadoop and MapReduce from 
this Udacity course (https://www.udacity.com/course/ud617) offered by Cloudera. 
I've familiarized myself with Java,JUnit,Maven,IntelliJ and Git while 
implementing algorithms (https://github.com/jayapriya90/algorithms).  With 
mentorship from Phoenix committers, I believe I'll be able to complete these 
features in GSoC timeframe.

Thanks and Regards
Jayapriya Surendran

> Add CUBE/ROLLUP/GROUPING SET operators for advanced aggregations
> ----------------------------------------------------------------
>
>                 Key: PHOENIX-1772
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1772
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: jayapriya surendran
>            Assignee: jayapriya surendran
>              Labels: gsoc2015, java, sql
>
> I noticed from Phoenix language documentation ( 
> http://phoenix.apache.org/language/index.html ) that Phoenix is missing 
> CUBE/ROLLUP and GROUPING_SET operators which are already supported by other 
> similar projects like Apache Pig and Apache Hive. Here is brief overview of 
> my proposal (the syntax that is proposed below is same as PostgreSQL 
> https://wiki.postgresql.org/wiki/Grouping_Sets)
> *Proposed syntax for CUBE:*
> SELECT name, place, SUM(count) FROM cars GROUP BY CUBE(name, place);
> For every row that we process we need to emit 2^n combinations of rows where 
> n corresponds to number of aggregate columns. For the above example query, 
> for every row we need to emit 4 rows, one for each level of aggregations 
> {(name, place), (name, *), (*, place), (*, *)}.
> *Proposed syntax for ROLLUP:*
> SELECT name, place, SUM(count) FROM cars GROUP BY ROLLUP(name, place);
> For every row that we process we need to emit n+1 combinations of rows where 
> n corresponds to number of aggregate columns. For the above example query, 
> for every row we need to emit 3 rows, one for each hierarchical level of 
> aggregations {(name, place), (name, *), (*, *)}.
> *Propose syntax for GROUPING_SETS:*
> SELECT name, place, SUM(count) FROM cars GROUP BY GROUPING SETS(name, ());
> For every row that we process we need to emit n combinations of rows where n 
> corresponds to size of grouping set. For the above example query, for every 
> row we need to emit 2 rows, one for each specified level of aggregations 
> {(name, *), (*, *)}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to