DISTINCT for prefixes of the PK

Lars Hofhansl (JIRA) Thu, 24 Mar 2016 23:08:49 -0700

     [ 
https://issues.apache.org/jira/browse/PHOENIX-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lars Hofhansl updated PHOENIX-2797:
-----------------------------------
    Description: 
All of MIN, MAX, and DISTINCT always perform a full scan, even when they are on 
a prefix of a compound key.

For MIN and MAX one only needs to find the first and last row (resp) and we'll 
have our answer. This works for the full key or a prefix of the key.
This should work find with or without a WHERE clause, as long as we can 
identify the first and last row.

For DISTINCT we could do a skip scan to the next prefix (only helps with a true 
prefix of a compound key).
Say the key is (K1, K2), and say further that we're doing DISTINCT(K1). We can 
skip to the next value of K1 once we found a value. This should have a dramatic 
impact when the cardinality of K2 is high.
With a WHERE clause that might itself be causing a SKIP SCAN, this might be 
quite tricky. Would need to think about it.

Both of these statements hold equally when querying against an index.

Anyway... Just filing this as an idea for now.


  was:
All of MIN, MAX, and DISTINCT always perform a full scan, even when they for a 
prefix of a compound key.

For MIN and MAX only need to find the first and last row (resp) and we'll have 
our answer. This works for the full key or a prefix of the key.
This should work find with or without a WHERE clause, as long as we can 
identify the first and last.

For DISTINCT we could a skip scan to the next prefix (only help with a true 
prefix of a compound key).
Say the key is (K1, K2), and say that we're doing DISTINCT(K1). We can skip to 
the next value of K1 once we found a value. This should have a dramatic impact 
when the cardinality of K2 is high.
With a WHERE clause that might itself be causing a SKIP SCAN, this might be 
quite tricky. Would need to think about it.

Both of these statement hold equally when querying against an index.

Anyway... Just filing this as an idea for now.


> Ideas to speed up MIN/MAX/DISTINCT for prefixes of the PK
> ---------------------------------------------------------
>
>                 Key: PHOENIX-2797
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2797
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> All of MIN, MAX, and DISTINCT always perform a full scan, even when they are 
> on a prefix of a compound key.
> For MIN and MAX one only needs to find the first and last row (resp) and 
> we'll have our answer. This works for the full key or a prefix of the key.
> This should work find with or without a WHERE clause, as long as we can 
> identify the first and last row.
> For DISTINCT we could do a skip scan to the next prefix (only helps with a 
> true prefix of a compound key).
> Say the key is (K1, K2), and say further that we're doing DISTINCT(K1). We 
> can skip to the next value of K1 once we found a value. This should have a 
> dramatic impact when the cardinality of K2 is high.
> With a WHERE clause that might itself be causing a SKIP SCAN, this might be 
> quite tricky. Would need to think about it.
> Both of these statements hold equally when querying against an index.
> Anyway... Just filing this as an idea for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PHOENIX-2797) Ideas to speed up MIN/MAX/DISTINCT for prefixes of the PK

Reply via email to