GROUPING SET operators for advanced aggregations

Prasad Shivanna (JIRA) Sat, 26 Mar 2016 17:25:09 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213267#comment-15213267
 ]


Prasad Shivanna commented on PHOENIX-1772:
------------------------------------------

Hi James,

I was also thinking along the same line. I have some doubts behind the ideology 
of Kylin and Phoenix, please correct me if I'm wrong, Phoenix achieves its 
performance and does most of the work on servers by converting SQL queries to 
HBase scans and distributing them to Region servers, while Kylin relies on MR 
jobs to get the job done, won't this be a conflict at some point if we 
integrate both of them. Also , I'm pretty new to all of them, I need sometime 
to go through all the documentation and get hold of these projects. It might 
take sometime to get started for me. Yes, I'm up for it and very much excited 
about it.

Regards,
Prasad

> Add CUBE/ROLLUP/GROUPING SET operators for advanced aggregations
> ----------------------------------------------------------------
>
>                 Key: PHOENIX-1772
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1772
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Jayapriya Surendran
>              Labels: gsoc2016, java, sql
>         Attachments: GSoCProposal.pdf
>
>
> I noticed from Phoenix language documentation ( 
> http://phoenix.apache.org/language/index.html ) that Phoenix is missing 
> CUBE/ROLLUP and GROUPING_SET operators which are already supported by other 
> similar projects like Apache Pig and Apache Hive. Here is brief overview of 
> my proposal (the syntax that is proposed below is same as PostgreSQL 
> https://wiki.postgresql.org/wiki/Grouping_Sets)
> *Proposed syntax for CUBE:*
> SELECT name, place, SUM(count) FROM cars GROUP BY CUBE(name, place);
> For every row that we process we need to emit 2^n combinations of rows where 
> n corresponds to number of aggregate columns. For the above example query, 
> for every row we need to emit 4 rows, one for each level of aggregations 
> {(name, place), (name, *), (*, place), (*, *)}.
> *Proposed syntax for ROLLUP:*
> SELECT name, place, SUM(count) FROM cars GROUP BY ROLLUP(name, place);
> For every row that we process we need to emit n+1 combinations of rows where 
> n corresponds to number of aggregate columns. For the above example query, 
> for every row we need to emit 3 rows, one for each hierarchical level of 
> aggregations {(name, place), (name, *), (*, *)}.
> *Propose syntax for GROUPING_SETS:*
> SELECT name, place, SUM(count) FROM cars GROUP BY GROUPING SETS(name, ());
> For every row that we process we need to emit n combinations of rows where n 
> corresponds to size of grouping set. For the above example query, for every 
> row we need to emit 2 rows, one for each specified level of aggregations 
> {(name, *), (*, *)}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1772) Add CUBE/ROLLUP/GROUPING SET operators for advanced aggregations

Reply via email to