[jira] [Comment Edited] (PHOENIX-3439) Query using an RVC based on the base table PK is incorrectly using an index and doing a full scan instead of a point query

Jan Fernando (JIRA) Sat, 05 Nov 2016 15:22:38 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15640687#comment-15640687
 ]


Jan Fernando edited comment on PHOENIX-3439 at 11/5/16 10:21 PM:
-----------------------------------------------------------------

[[email protected]] [~samarthjain] I test this out with the Phoenix 4.9 
jar and it doesn't seems to be completely fixed. It looks like the fix works 
for standard tables but not for queries against multi-tenant tables.

1. Create the following tables and index
CREATE TABLE IF NOT EXISTS MY_TABLES.MY_TABLE (ORGANIZATION_ID CHAR(15) NOT 
NULL, PKCOL1 CHAR(15) NOT NULL,PKCOL2 CHAR(15) NOT NULL,PKCOL3 CHAR(15) NOT 
NULL,PKCOL4 CHAR(15) NOT NULL,COL1 CHAR(15),COL2 CHAR(15)CONSTRAINT PK PRIMARY 
KEY (ORGANIZATION_ID,PKCOL1,PKCOL2,PKCOL3,PKCOL4)) MULTI_TENANT=true

CREATE INDEX IF NOT EXISTS MY_TABLE_INDEX 
ON MY_TABLES.MY_TABLE (PKCOL1, PKCOL3, PKCOL2, PKCOL4)
INCLUDE (COL1, COL2);

2. Create a MT view as follows:
CREATE VIEW IF NOT EXISTS MY_TABLE_MT_VIEW AS SELECT * FROM MY_TABLES.MY_TABLE;

3. Insert a record into the MT view:
upsert into my_table_mt_view (pkcol1, pkcol2, pkcol3, pkcol4) values ('1', '2', 
'3', '4');

4. Run explain:
explain select * from my_table_mt_view where (pkcol1, pkcol2, pkcol3, pkcol4) > 
('0', '0', '0', '0');
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER 
MY_TABLES.MY_TABLE_INDEX ['00Dxx0000001gFA','0              '] - 
['00Dxx0000001gFA',*] |
|     SERVER FILTER BY ("PKCOL1", "PKCOL2", "PKCOL3", "PKCOL4") > 
(TO_CHAR('0'), TO_CHAR('0'), TO_CHAR('0'), TO_CHAR('0')) |
+------------------------------------------+

As you can see the query optimizer still wants to use the secondary index. I 
don't see this behavior when using a non-MT connection and non-multi-tenant 
table.

For the standard table using a non MT connection I see:

0: jdbc:phoenix:localhost>  explain select * from MY_TABLES.MY_TABLE  where 
(organization_id, pkcol1, pkcol2, pkcol3, pkcol4) > ('00Dxx0000001gFA', '1', 
'2', '3', '4');
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER MY_TABLES.MY_TABLE |
+------------------------------------------+



was (Author: jfernando_sfdc):
[[email protected]] [~samarthjain] I test this out with the Phoenix 4.9 
jar and it doesn't seems to be completely fixed. It looks like the fix works 
for standard tables but not for queries against multi-tenant tables.

1. Create the following tables and index
CREATE TABLE IF NOT EXISTS MY_TABLES.MY_TABLE (ORGANIZATION_ID CHAR(15) NOT 
NULL, PKCOL1 CHAR(15) NOT NULL,PKCOL2 CHAR(15) NOT NULL,PKCOL3 CHAR(15) NOT 
NULL,PKCOL4 CHAR(15) NOT NULL,COL1 CHAR(15),COL2 CHAR(15)CONSTRAINT PK PRIMARY 
KEY (ORGANIZATION_ID,PKCOL1,PKCOL2,PKCOL3,PKCOL4)) MULTI_TENANT=true

CREATE INDEX IF NOT EXISTS MY_TABLE_INDEX 
ON MY_TABLES.MY_TABLE (PKCOL1, PKCOL3, PKCOL2, PKCOL4)
INCLUDE (COL1, COL2);

2. Create a MT view as follows:
CREATE VIEW IF NOT EXISTS MY_TABLE_MT_VIEW AS SELECT * FROM MY_TABLES.MY_TABLE;

3. Insert a record into the MT view:
upsert into my_table_mt_view (pkcol1, pkcol2, pkcol3, pkcol4) values ('1', '2', 
'3', '4');

4. Run explain:
explain select * from my_table_mt_view where (pkcol1, pkcol2, pkcol3, pkcol4) > 
('0', '0', '0', '0');
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER 
MY_TABLES.MY_TABLE_INDEX ['00Dxx0000001gFA','0              '] - 
['00Dxx0000001gFA',*] |
|     SERVER FILTER BY ("PKCOL1", "PKCOL2", "PKCOL3", "PKCOL4") > 
(TO_CHAR('0'), TO_CHAR('0'), TO_CHAR('0'), TO_CHAR('0')) |
+------------------------------------------+

As you can see the query optimizer still wants to use the secondary index. I 
don't 

For the standard table using a non MT connection I see:

0: jdbc:phoenix:localhost>  explain select * from MY_TABLES.MY_TABLE  where 
(organization_id, pkcol1, pkcol2, pkcol3, pkcol4) > ('00Dxx0000001gFA', '1', 
'2', '3', '4');
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER MY_TABLES.MY_TABLE |
+------------------------------------------+


> Query using an RVC based on the base table PK is incorrectly using an index 
> and doing a full scan instead of a point query
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3439
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3439
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.8.1
>            Reporter: Jan Fernando
>            Assignee: James Taylor
>             Fix For: 4.9.0, 4.8.2
>
>         Attachments: PHOENIX-3439.patch
>
>
> We use Phoenix RVCs to support paginated queries. This performance of this 
> functionality relies on Phoenix predictably generating scans against a table 
> or index with a PK that matches the RVC specified for each page.
> What we do is that on the initial query we use 
> PhoenixRuntime.getPkColsDataTypesForSql() to get the list of PK columns and 
> persist that and use those to generate RVCs for paginated queries.
> We have discovered that for queries where:
> a) the user doesn't specify an ORDER BY
> b) for tables where secondary indexes are present
> Phoenix returns pk cols for the base table via getPkColsDataTypesForSql() but 
> then subsequent queries using the RVCs to paginate execute against a 
> secondary index doing a full scan.
> We have a table with a secondary index where this is an issue. The base table 
> has a PK of PKCOL1, PKCOL2, PKCOL3 and 
> PKCOL4. We have an immutable secondary index where the PK is PKCOL1, PKCOL3, 
> PKCOL2, PKCOL4.
> Here's what happens:
> Here is our query we run to get the Query plan from which we generate the 
> RVCs to be used for paging:
> EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
> FROM MY_TABLES."MYTABLE"
> LIMIT 501;
> I get the following explain:
> CLIENT 6-CHUNK SERIAL 1-WAY ROUND ROBIN RANGE SCAN OVER MY_TABLES.MY_TABLE 
> ['00Dxx0000001gFA']
> SERVER 501 ROW LIMIT
> CLIENT 501 ROW LIMIT
> Therefore the columns we record for RVCs for paging are PK1, PK2, PK3, PK4 
> from MY_TABLES.MY_TABLE
> However when I generate the RVC query to page through the data:
> EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
> FROM MY_TABLES."MYTABLE"
> (pkcol1, pkcol2, pkcol3,pkcol4) > 
> ('001','001xx000003DHml',to_date('2015-10-21 09 
> (tel:2015102109):50:55.0'),'017xx0000022FuI')
> LIMIT 501;
> I get the follow explain plan:
> CLIENT 24-CHUNK 7811766 ROWS 6291457403 BYTES PARALLEL 1-WAY ROUND ROBIN 
> RANGE SCAN OVER MY_TABLES.MY_SECONDARY_INDEX ['00Dxx0000001gFA','001'] - 
> ['00Dxx0000001gFA',*]
> SERVER FILTER BY ("PKCOL1", "PKCOL2, "PKCOL3", "PKCOL4") > (TO_CHAR('001'), 
> TO_CHAR('001xx000003DHml'), DATE '2015-10-21 09 (tel:2015102109):50:55.000', 
> TO_CHAR('017xx0000022FuI'))
> SERVER 501 ROW LIMIT
> CLIENT 501 ROW LIMIT
> We expected that the second query with RVCs above would execute against the 
> base table as the base table PK is PKCOL1, PKCOL2, PKCOL3, PKCOL4 and the 
> index PK is PKCOL1, PKCOL3, PKCOL2, PKCOL4.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-3439) Query using an RVC based on the base table PK is incorrectly using an index and doing a full scan instead of a point query

Reply via email to