[jira] [Commented] (PHOENIX-2700) Push down count(group by key) queries

James Taylor (JIRA) Sat, 20 Feb 2016 15:49:58 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155770#comment-15155770
 ]


James Taylor commented on PHOENIX-2700:
---------------------------------------

I think if sliding windows are implemented correctly, they could solve this 
issue. Any sliding window implementation would need to handle the boundary case 
correctly (i.e. parallelized scans and region boundaries). I'm thinking 
something along the lines of returning the window state to the client when the 
boundary is encountered (with the prior aggregated state - the count in this 
case - returned as well). Then the final merge done by the client would need to 
take this boundary case into account. I supposed we'd need both the initial 
window state plus the final window state to handle this general boundary 
straddling case. 

> Push down count(group by key) queries
> -------------------------------------
>
>                 Key: PHOENIX-2700
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2700
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> Queries that attempt to detect duplicates potentially return a lot of data to 
> the client if the column being deduped is near unique.  For example:
> {code}
> SELECT SUM(DUP_COUNT) 
> FROM ( 
>     SELECT DEDUP_KEY, COUNT(1) As DUP_COUNT
>     FROM TABLE_TO_DEDUP
>     GROUP BY DEDUP_KEY
> )
> WHERE DUP_COUNT > 1
> {code}
> If all of the following are true, then we can detect duplicates on the region 
> server in our coprocessors instead of returning every unique DEDUP_KEY to the 
> client for a final merge:
> - each scan won't be split on the same DEDUP_KEY
> - the DEDUP_KEY is the leading primary key column
> - we can push the DUP_COUNT > 1 evaluation through our coprocessor
> The first requirement is the hardest, but potentially there could be a custom 
> split policy added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2700) Push down count(group by key) queries

Reply via email to