[ 
https://issues.apache.org/jira/browse/PHOENIX-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155425#comment-15155425
 ] 

James Taylor commented on PHOENIX-2700:
---------------------------------------

Yes, this particular query only needs to sum the duplicates. Would a sliding 
window be able to prevent all the keys from being returned from the server? The 
trickiest case is the boundary case, though - where a duplicate may straddle 
across two regions. Nope, no windowing in Phoenix yet.

> Push down count(group by key) queries
> -------------------------------------
>
>                 Key: PHOENIX-2700
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2700
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> Queries that attempt to detect duplicates potentially return a lot of data to 
> the client if the column being deduped is near unique.  For example:
> {code}
> SELECT SUM(DUP_COUNT) 
> FROM ( 
>     SELECT DEDUP_KEY, COUNT(1) As DUP_COUNT
>     FROM TABLE_TO_DEDUP
>     GROUP BY DEDUP_KEY
> )
> WHERE DUP_COUNT > 1
> {code}
> If all of the following are true, then we can detect duplicates on the region 
> server in our coprocessors instead of returning every unique DEDUP_KEY to the 
> client for a final merge:
> - each scan won't be split on the same DEDUP_KEY
> - the DEDUP_KEY is the leading primary key column
> - we can push the DUP_COUNT > 1 evaluation through our coprocessor
> The first requirement is the hardest, but potentially there could be a custom 
> split policy added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to