[
https://issues.apache.org/jira/browse/PHOENIX-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306800#comment-15306800
]
James Taylor commented on PHOENIX-258:
--------------------------------------
This is looking very good, [~lhofhansl]. A couple of minor items:
The number of distinct columns needs to be calculated by OrderPreservingTracker
instead of assuming that it's the number of group by expressions, as that won't
be correct in all cases (for RVCs and for group by expressions that reference
the same pk column multiple times). It's a pretty straightforward change, as
the value is already being calculated, it's just not remembered and carried
through to the GroupBy object. Here's what needs to be done:
- In OrderPreservingTracker, add an orderPreservingColumnCount member variable
and accessor.
- In OrderPreservingTracker.isOrderPreserving(), before the return, set
orderPreservingColumnCount = prevPos + prevSlotSpan.
- In GroupByCompiler.GroupBy, add an orderPreservingColumnCount member variable
and accessor.
- In GroupByBuilder add a setOrderPreservingColumnCount() method.
- Instead of returning {{this}} in GroupBy.compile(), return a new GroupBy
instance that's a copy of the old one, except call
builder.setOrderPreservingColumnCount(tracker.getOrderPreservingColumnCount())
to transfer over the column count (maybe make a GroupByBuilder(GroupBy)
constructor). This will carry forward the orderPreservingColumnCount and make
it accessible during compilation.
- In BaseResultIterators, instead of {{int cols =
plan.getGroupBy().getKeyExpressions().size()}} do {{int cols =
plan.getGroupBy().getOrderPreservingColumnCount()}}
- Add DistinctPrefixFilter.getOrderPreservingColumnCount(), and add asserts in
planner tests that the value was calculated correctly (since this is calculated
at compile time, you don't need an IT test for this). A good couple of tests
would be:
{code}
SELECT prefix11 FROM t GROUP BY prefix1, TRUNC(prefix1); //
orderPreservingColumnCount should be 1
SELECT (prefix11, prefix2) FROM t GROUP BY (prefix11, prefix2); //
orderPreservingColumnCount should be 2
{code}
For the previousKey logic in DistinctPrefixFilter, you need to add trailing
0xFF bytes to ensure you're not skipping back too far. You can just grab
BaseScannerRegionObserver.getReversedRow(), rename it to getClosestRowBefore()
and move it to ByteUtil, as this does the right thing. FWIW, HBase has similar
code in ClientScanner.createClosestRowBefore(). It's ugly and unfortunate, but
I'm sure the reverse scan implementors had a good reason to not make the start
row exclusive for reverse scans.
If there are still issues with a group by RVC, please file a separate JIRA.
> Use skip scan when SELECT DISTINCT on leading row key column(s)
> ---------------------------------------------------------------
>
> Key: PHOENIX-258
> URL: https://issues.apache.org/jira/browse/PHOENIX-258
> Project: Phoenix
> Issue Type: Task
> Reporter: ryang-sfdc
> Assignee: Lars Hofhansl
> Fix For: 4.8.0
>
> Attachments: 258-WIP.txt, 258-v1.txt, 258-v10.txt, 258-v2.txt,
> 258-v3.txt, 258-v4.txt, 258-v5.txt, 258-v6.txt, 258-v7.txt, 258-v8.txt,
> 258-v9.txt, 258.txt, DistinctFixedPrefixFilter.java, in-clause.png
>
>
> create table(a varchar(32) not null, date date not null constraint pk primary
> key(a,date))
> [["PLAN"],["CLIENT PARALLEL 94-WAY FULL SCAN OVER foo"],[" SERVER
> AGGREGATE INTO ORDERED DISTINCT ROWS BY [a]"],["CLIENT MERGE SORT"]]
>
> We should skip scan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)