James Taylor created PHOENIX-2700:
-------------------------------------

             Summary: Push down count(group by key) queries
                 Key: PHOENIX-2700
                 URL: https://issues.apache.org/jira/browse/PHOENIX-2700
             Project: Phoenix
          Issue Type: Bug
            Reporter: James Taylor


Queries that attempt to detect duplicates potentially return a lot of data to 
the client if the column being deduped is near unique.  For example:

{code}
SELECT SUM(DUP_COUNT) 
FROM ( 
    SELECT DEDUP_KEY, COUNT(1) As DUP_COUNT
    FROM TABLE_TO_DEDUP
    GROUP BY DEDUP_KEY
)
WHERE DUP_COUNT > 1
{code}

If all of the following are true, then we can detect duplicates on the region 
server in our coprocessors instead of returning every unique DEDUP_KEY to the 
client for a final merge:
- each scan won't be split on the same DEDUP_KEY
- the DEDUP_KEY is the leading primary key column
- we can push the DUP_COUNT > 1 evaluation through our coprocessor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to