Jan Fernando created PHOENIX-3439:
-------------------------------------

             Summary: Query using an RVC based on the base table PK is 
incorrectly using an index and doing a full scan instead of a point query
                 Key: PHOENIX-3439
                 URL: https://issues.apache.org/jira/browse/PHOENIX-3439
             Project: Phoenix
          Issue Type: Bug
    Affects Versions: 4.8.1
            Reporter: Jan Fernando


We use Phoenix RVCs to support paginated queries. This performance of this 
functionality relies on Phoenix predictably generating scans against a table or 
index with a PK that matches the RVC specified for each page.

What we do is that on the initial query we use 
PhoenixRuntime.getPkColsDataTypesForSql() to get the list of PK columns and 
persist that and use those to generate RVCs for paginated queries.

We have discovered that for queries where:
a) the user doesn't specify an ORDER BY
b) for tables where secondary indexes are present

Phoenix returns pk cols for the base table via getPkColsDataTypesForSql() but 
then subsequent queries using the RVCs to paginate execute against a secondary 
index doing a full scan.

We have a table with a secondary index where this is an issue. The base table 
has a PK of PKCOL1, PKCOL2, PKCOL3 and 
PKCOL4. We have an immutable secondary index where the PK is PKCOL1, PKCOL3, 
PKCOL2, PKCOL4.

Here's what happens:

Here is our query we run to get the Query plan from which we generate the RVCs 
to be used for paging:

EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
FROM MY_TABLES."MYTABLE"
LIMIT 501;

I get the following explain:
CLIENT 6-CHUNK SERIAL 1-WAY ROUND ROBIN RANGE SCAN OVER MY_TABLES.MY_TABLE 
['00Dxx0000001gFA']
SERVER 501 ROW LIMIT
CLIENT 501 ROW LIMIT

Therefore the columns we record for RVCs for paging are PK1, PK2, PK3, PK4 from 
MY_TABLES.MY_TABLE

However when I generate the RVC query to page through the data:

EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
FROM MY_TABLES."MYTABLE"
(pkcol1, pkcol2, pkcol3,pkcol4) > ('001','001xx000003DHml',to_date('2015-10-21 
09 (tel:2015102109):50:55.0'),'017xx0000022FuI')
LIMIT 501;

I get the follow explain plan:
CLIENT 24-CHUNK 7811766 ROWS 6291457403 BYTES PARALLEL 1-WAY ROUND ROBIN RANGE 
SCAN OVER MY_TABLES.MY_SECONDARY_INDEX ['00Dxx0000001gFA','001'] - 
['00Dxx0000001gFA',*]
SERVER FILTER BY ("PKCOL1", "PKCOL2, "PKCOL3", "PKCOL4") > (TO_CHAR('001'), 
TO_CHAR('001xx000003DHml'), DATE '2015-10-21 09 (tel:2015102109):50:55.000', 
TO_CHAR('017xx0000022FuI'))
SERVER 501 ROW LIMIT
CLIENT 501 ROW LIMIT

We expected that the second query with RVCs above would execute against the 
base table as the base table PK is PKCOL1, PKCOL2, PKCOL3, PKCOL4 and the index 
PK is PKCOL1, PKCOL3, PKCOL2, PKCOL4.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to