[
https://issues.apache.org/jira/browse/PHOENIX-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378388#comment-14378388
]
jayapriya surendran commented on PHOENIX-1772:
----------------------------------------------
Hello everyone,
I'm Jayapriya Surendran, currently pursuing MS in Computer Engineering at San
Jose State University. I would like to implement this proposed idea in Phoenix
as part of Google Summer of Code 2015. I have used these operators as a part of
Data Mining and Data Warehousing course and found it really useful for many
advanced aggregation use-cases. I am familiar with the concepts of these
operations although I am new to Phoenix codebase. With mentorship from Phoenix
committers, I believe I'll be able to complete these features in GSoC timeframe.
Thanks and Regards
Jayapriya Surendran
> Add CUBE/ROLLUP/GROUPING SET operators for advanced aggregations
> ----------------------------------------------------------------
>
> Key: PHOENIX-1772
> URL: https://issues.apache.org/jira/browse/PHOENIX-1772
> Project: Phoenix
> Issue Type: New Feature
> Reporter: jayapriya surendran
> Labels: gsoc2015
>
> I noticed from Phoenix language documentation (
> http://phoenix.apache.org/language/index.html ) that Phoenix is missing
> CUBE/ROLLUP and GROUPING_SET operators which are already supported by other
> similar projects like Apache Pig and Apache Hive. Here is brief overview of
> my proposal (the syntax that is proposed below is same as PostgreSQL
> https://wiki.postgresql.org/wiki/Grouping_Sets)
> *Proposed syntax for CUBE:*
> SELECT name, place, SUM(count) FROM cars GROUP BY CUBE(name, place);
> For every row that we process we need to emit 2^n combinations of rows where
> n corresponds to number of aggregate columns. For the above example query,
> for every row we need to emit 4 rows, one for each level of aggregations
> {(name, place), (name, *), (*, place), (*, *)}.
> *Proposed syntax for ROLLUP:*
> SELECT name, place, SUM(count) FROM cars GROUP BY ROLLUP(name, place);
> For every row that we process we need to emit n+1 combinations of rows where
> n corresponds to number of aggregate columns. For the above example query,
> for every row we need to emit 3 rows, one for each hierarchical level of
> aggregations {(name, place), (name, *), (*, *)}.
> *Propose syntax for GROUPING_SETS:*
> SELECT name, place, SUM(count) FROM cars GROUP BY GROUPING SETS(name, ());
> For every row that we process we need to emit n combinations of rows where n
> corresponds to size of grouping set. For the above example query, for every
> row we need to emit 2 rows, one for each specified level of aggregations
> {(name, *), (*, *)}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)