DISTINCT for prefixes of the PK

Lars Hofhansl (JIRA) Thu, 24 Mar 2016 23:06:47 -0700

Lars Hofhansl created PHOENIX-2797:
--------------------------------------

             Summary: Ideas to speed up MIN/MAX/DISTINCT for prefixes of the PK
                 Key: PHOENIX-2797
                 URL: https://issues.apache.org/jira/browse/PHOENIX-2797
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Lars Hofhansl
            Priority: Minor



All of MIN, MAX, and DISTINCT always perform a full scan, even when they for a 
prefix of a compound key.

For MIN and MAX only need to find the first and last row (resp) and we'll have 
our answer. This works for the full key or a prefix of the key.
This should work find with or without a WHERE clause, as long as we can 
identify the first and last.

For DISTINCT we could a skip scan to the next prefix (only help with a true 
prefix of a compound key).
Say the key is (K1, K2), and say that we're doing DISTINCT(K1). We can skip to 
the next value of K1 once we found a value. This should have a dramatic impact 
when the cardinality of K2 is high.
With a WHERE clause that might itself be causing a SKIP SCAN, this might be 
quite tricky. Would need to think about it.

Both of these statement hold equally when querying against an index.

Anyway... Just filing this as an idea for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PHOENIX-2797) Ideas to speed up MIN/MAX/DISTINCT for prefixes of the PK

Reply via email to