[
https://issues.apache.org/jira/browse/PHOENIX-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lars Hofhansl updated PHOENIX-258:
----------------------------------
Attachment: DistinctFixedPrefixFilter.java
Some more experiments. Added a simple Filter to HBase that only let's distinct
prefixes pass.
Then I loaded some data with an 15 char prefix. I started with 16 distinct
prefixes and then added more and more rows for each.
With just about 1k values per prefix that filter is faster. 40ms vs 23ms.
With 16k values, it's 380ms vs 23ms.
With 64k values it's 1.4s vs 23ms.
With 512k value it's 6.5s vs 26ms.
And so on. The Skip-Scan is more or less constant since the number of seeks is
what counts.
Need to test more of course with more prefixes, etc.
The filter is a hack of course, prefix length hardcode, a fixed length, etc.
Now the task is to integrate this with the Phoenix Schema.
But definitely worth doing.
> Use skip scan when SELECT DISTINCT on leading row key column(s)
> ---------------------------------------------------------------
>
> Key: PHOENIX-258
> URL: https://issues.apache.org/jira/browse/PHOENIX-258
> Project: Phoenix
> Issue Type: Task
> Reporter: ryang-sfdc
> Labels: gsoc2016
> Fix For: 4.8.0
>
> Attachments: 258.txt, DistinctFixedPrefixFilter.java
>
>
> create table(a varchar(32) not null, date date not null constraint pk primary
> key(a,date))
> [["PLAN"],["CLIENT PARALLEL 94-WAY FULL SCAN OVER foo"],[" SERVER
> AGGREGATE INTO ORDERED DISTINCT ROWS BY [a]"],["CLIENT MERGE SORT"]]
>
> We should skip scan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)