[jira] [Resolved] (PHOENIX-3439) Query using an RVC based on the base table PK is incorrectly using an index and doing a full scan instead of a point query

James Taylor (JIRA) Sat, 05 Nov 2016 22:40:39 -0700

     [ 
https://issues.apache.org/jira/browse/PHOENIX-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


James Taylor resolved PHOENIX-3439.
-----------------------------------
    Resolution: Fixed

[~lhofhansl] - my original fix wasn't complete. I've ported my new fix over to 
the 4.8 branches. We should probably roll a new RC that includes this. FWIW, 
this JIRA is not a regression, but it's an important optimization that prevents 
a full table scan from happening.

> Query using an RVC based on the base table PK is incorrectly using an index 
> and doing a full scan instead of a point query
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3439
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3439
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.8.1
>            Reporter: Jan Fernando
>            Assignee: James Taylor
>             Fix For: 4.9.0, 4.8.2
>
>         Attachments: PHOENIX-3439.patch, PHOENIX-3439_addendum1.patch
>
>
> We use Phoenix RVCs to support paginated queries. This performance of this 
> functionality relies on Phoenix predictably generating scans against a table 
> or index with a PK that matches the RVC specified for each page.
> What we do is that on the initial query we use 
> PhoenixRuntime.getPkColsDataTypesForSql() to get the list of PK columns and 
> persist that and use those to generate RVCs for paginated queries.
> We have discovered that for queries where:
> a) the user doesn't specify an ORDER BY
> b) for tables where secondary indexes are present
> Phoenix returns pk cols for the base table via getPkColsDataTypesForSql() but 
> then subsequent queries using the RVCs to paginate execute against a 
> secondary index doing a full scan.
> We have a table with a secondary index where this is an issue. The base table 
> has a PK of PKCOL1, PKCOL2, PKCOL3 and 
> PKCOL4. We have an immutable secondary index where the PK is PKCOL1, PKCOL3, 
> PKCOL2, PKCOL4.
> Here's what happens:
> Here is our query we run to get the Query plan from which we generate the 
> RVCs to be used for paging:
> EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
> FROM MY_TABLES."MYTABLE"
> LIMIT 501;
> I get the following explain:
> CLIENT 6-CHUNK SERIAL 1-WAY ROUND ROBIN RANGE SCAN OVER MY_TABLES.MY_TABLE 
> ['00Dxx0000001gFA']
> SERVER 501 ROW LIMIT
> CLIENT 501 ROW LIMIT
> Therefore the columns we record for RVCs for paging are PK1, PK2, PK3, PK4 
> from MY_TABLES.MY_TABLE
> However when I generate the RVC query to page through the data:
> EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
> FROM MY_TABLES."MYTABLE"
> (pkcol1, pkcol2, pkcol3,pkcol4) > 
> ('001','001xx000003DHml',to_date('2015-10-21 09 
> (tel:2015102109):50:55.0'),'017xx0000022FuI')
> LIMIT 501;
> I get the follow explain plan:
> CLIENT 24-CHUNK 7811766 ROWS 6291457403 BYTES PARALLEL 1-WAY ROUND ROBIN 
> RANGE SCAN OVER MY_TABLES.MY_SECONDARY_INDEX ['00Dxx0000001gFA','001'] - 
> ['00Dxx0000001gFA',*]
> SERVER FILTER BY ("PKCOL1", "PKCOL2, "PKCOL3", "PKCOL4") > (TO_CHAR('001'), 
> TO_CHAR('001xx000003DHml'), DATE '2015-10-21 09 (tel:2015102109):50:55.000', 
> TO_CHAR('017xx0000022FuI'))
> SERVER 501 ROW LIMIT
> CLIENT 501 ROW LIMIT
> We expected that the second query with RVCs above would execute against the 
> base table as the base table PK is PKCOL1, PKCOL2, PKCOL3, PKCOL4 and the 
> index PK is PKCOL1, PKCOL3, PKCOL2, PKCOL4.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (PHOENIX-3439) Query using an RVC based on the base table PK is incorrectly using an index and doing a full scan instead of a point query

Reply via email to