[jira] [Commented] (PHOENIX-258) Use skip scan when SELECT DISTINCT on leading row key column(s)

James Taylor (JIRA) Mon, 30 May 2016 09:23:45 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306800#comment-15306800
 ]


James Taylor commented on PHOENIX-258:
--------------------------------------

This is looking very good, [~lhofhansl]. A couple of minor items:

The number of distinct columns needs to be calculated by OrderPreservingTracker 
instead of assuming that it's the number of group by expressions, as that won't 
be correct in all cases (for RVCs and for group by expressions that reference 
the same pk column multiple times). It's a pretty straightforward change, as 
the value is already being calculated, it's just not remembered and carried 
through to the GroupBy object. Here's what needs to be done:
- In OrderPreservingTracker, add an orderPreservingColumnCount member variable 
and accessor.
- In OrderPreservingTracker.isOrderPreserving(), before the return, set 
orderPreservingColumnCount = prevPos + prevSlotSpan.
- In GroupByCompiler.GroupBy, add an orderPreservingColumnCount member variable 
and accessor.
- In GroupByBuilder add a setOrderPreservingColumnCount() method.
- Instead of returning {{this}} in GroupBy.compile(), return a new GroupBy 
instance that's a copy of the old one, except call 
builder.setOrderPreservingColumnCount(tracker.getOrderPreservingColumnCount()) 
to transfer over the column count (maybe make a GroupByBuilder(GroupBy) 
constructor). This will carry forward the orderPreservingColumnCount and make 
it accessible during compilation.
- In BaseResultIterators, instead of {{int cols = 
plan.getGroupBy().getKeyExpressions().size()}} do {{int cols = 
plan.getGroupBy().getOrderPreservingColumnCount()}}
- Add DistinctPrefixFilter.getOrderPreservingColumnCount(), and add asserts in 
planner tests that the value was calculated correctly (since this is calculated 
at compile time, you don't need an IT test for this). A good couple of tests 
would be:
{code}
SELECT prefix11 FROM t GROUP BY prefix1, TRUNC(prefix1); // 
orderPreservingColumnCount should be 1
SELECT (prefix11, prefix2) FROM t GROUP BY (prefix11, prefix2); // 
orderPreservingColumnCount should be 2
{code}

For the previousKey logic in DistinctPrefixFilter, you need to add trailing 
0xFF bytes to ensure you're not skipping back too far. You can just grab 
BaseScannerRegionObserver.getReversedRow(), rename it to getClosestRowBefore() 
and move it to ByteUtil, as this does the right thing. FWIW, HBase has similar 
code in ClientScanner.createClosestRowBefore(). It's ugly and unfortunate, but 
I'm sure the reverse scan implementors had a good reason to not make the start 
row exclusive for reverse scans.

If there are still issues with a group by RVC, please file a separate JIRA.

> Use skip scan when SELECT DISTINCT on leading row key column(s)
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-258
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-258
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: ryang-sfdc
>            Assignee: Lars Hofhansl
>             Fix For: 4.8.0
>
>         Attachments: 258-WIP.txt, 258-v1.txt, 258-v10.txt, 258-v2.txt, 
> 258-v3.txt, 258-v4.txt, 258-v5.txt, 258-v6.txt, 258-v7.txt, 258-v8.txt, 
> 258-v9.txt, 258.txt, DistinctFixedPrefixFilter.java, in-clause.png
>
>
> create table(a varchar(32) not null, date date not null constraint pk primary 
> key(a,date))
> [["PLAN"],["CLIENT PARALLEL 94-WAY FULL SCAN OVER foo"],["    SERVER 
> AGGREGATE INTO ORDERED DISTINCT ROWS BY [a]"],["CLIENT MERGE SORT"]]          
>    
> We should skip scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-258) Use skip scan when SELECT DISTINCT on leading row key column(s)

Reply via email to