[
https://issues.apache.org/jira/browse/PHOENIX-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Taylor resolved PHOENIX-3439.
-----------------------------------
Resolution: Fixed
[~lhofhansl] - my original fix wasn't complete. I've ported my new fix over to
the 4.8 branches. We should probably roll a new RC that includes this. FWIW,
this JIRA is not a regression, but it's an important optimization that prevents
a full table scan from happening.
> Query using an RVC based on the base table PK is incorrectly using an index
> and doing a full scan instead of a point query
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-3439
> URL: https://issues.apache.org/jira/browse/PHOENIX-3439
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.8.1
> Reporter: Jan Fernando
> Assignee: James Taylor
> Fix For: 4.9.0, 4.8.2
>
> Attachments: PHOENIX-3439.patch, PHOENIX-3439_addendum1.patch
>
>
> We use Phoenix RVCs to support paginated queries. This performance of this
> functionality relies on Phoenix predictably generating scans against a table
> or index with a PK that matches the RVC specified for each page.
> What we do is that on the initial query we use
> PhoenixRuntime.getPkColsDataTypesForSql() to get the list of PK columns and
> persist that and use those to generate RVCs for paginated queries.
> We have discovered that for queries where:
> a) the user doesn't specify an ORDER BY
> b) for tables where secondary indexes are present
> Phoenix returns pk cols for the base table via getPkColsDataTypesForSql() but
> then subsequent queries using the RVCs to paginate execute against a
> secondary index doing a full scan.
> We have a table with a secondary index where this is an issue. The base table
> has a PK of PKCOL1, PKCOL2, PKCOL3 and
> PKCOL4. We have an immutable secondary index where the PK is PKCOL1, PKCOL3,
> PKCOL2, PKCOL4.
> Here's what happens:
> Here is our query we run to get the Query plan from which we generate the
> RVCs to be used for paging:
> EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
> FROM MY_TABLES."MYTABLE"
> LIMIT 501;
> I get the following explain:
> CLIENT 6-CHUNK SERIAL 1-WAY ROUND ROBIN RANGE SCAN OVER MY_TABLES.MY_TABLE
> ['00Dxx0000001gFA']
> SERVER 501 ROW LIMIT
> CLIENT 501 ROW LIMIT
> Therefore the columns we record for RVCs for paging are PK1, PK2, PK3, PK4
> from MY_TABLES.MY_TABLE
> However when I generate the RVC query to page through the data:
> EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
> FROM MY_TABLES."MYTABLE"
> (pkcol1, pkcol2, pkcol3,pkcol4) >
> ('001','001xx000003DHml',to_date('2015-10-21 09
> (tel:2015102109):50:55.0'),'017xx0000022FuI')
> LIMIT 501;
> I get the follow explain plan:
> CLIENT 24-CHUNK 7811766 ROWS 6291457403 BYTES PARALLEL 1-WAY ROUND ROBIN
> RANGE SCAN OVER MY_TABLES.MY_SECONDARY_INDEX ['00Dxx0000001gFA','001'] -
> ['00Dxx0000001gFA',*]
> SERVER FILTER BY ("PKCOL1", "PKCOL2, "PKCOL3", "PKCOL4") > (TO_CHAR('001'),
> TO_CHAR('001xx000003DHml'), DATE '2015-10-21 09 (tel:2015102109):50:55.000',
> TO_CHAR('017xx0000022FuI'))
> SERVER 501 ROW LIMIT
> CLIENT 501 ROW LIMIT
> We expected that the second query with RVCs above would execute against the
> base table as the base table PK is PKCOL1, PKCOL2, PKCOL3, PKCOL4 and the
> index PK is PKCOL1, PKCOL3, PKCOL2, PKCOL4.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)