[jira] [Comment Edited] (PHOENIX-5096) Local index region pruning is not working as expected.

Lars Hofhansl (JIRA) Thu, 10 Jan 2019 13:39:25 -0800


    [ 
https://issues.apache.org/jira/browse/PHOENIX-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739782#comment-16739782
 ]


Lars Hofhansl edited comment on PHOENIX-5096 at 1/10/19 9:38 PM:
-----------------------------------------------------------------

Ok, so let's try:

create table test (pk1 integer not null, pk2 integer not null, pk3 integer not 
null, v1 float, v2 float, v3 integer CONSTRAINT pk PRIMARY KEY (pk1, pk2, pk3));
create local index l1 *on test(pk1, v1)*;

(Same table, but the index defined on (pk1, v1)

{code}
explain select count(*) from test;
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                         PLAN                                  
       | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER 
TEST [1]  | 314572800       | 5518821        | 1547154736193  |
|     SERVER FILTER BY FIRST KEY ONLY                                           
       | 314572800       | 5518821        | 1547154736193  |
|     SERVER AGGREGATE INTO SINGLE ROW                                          
       | 314572800       | 5518821        | 1547154736193  |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.113 seconds)
{code}

Again, 4 regions.

{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where pk1 < 10000;
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                PLAN                           
                     | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER 
TEST [1,*] - [1,10000]  | 314572800       | 5518821        | 1547154736193  |
|     SERVER FILTER BY FIRST KEY ONLY                                           
                     | 314572800       | 5518821        | 1547154736193  |
|     SERVER AGGREGATE INTO SINGLE ROW                                          
                     | 314572800       | 5518821        | 1547154736193  |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.072 seconds)
{code}

Again, all pks starting with pk1 < 10000 are in the same region.

{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1;
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                         PLAN                                  
       | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER 
TEST [1]  | 314572800       | 5518821        | 1547154736193  |
|     SERVER FILTER BY FIRST KEY ONLY AND TO_FLOAT("V1") = 0.1                  
       | 314572800       | 5518821        | 1547154736193  |
|     SERVER AGGREGATE INTO SINGLE ROW                                          
       | 314572800       | 5518821        | 1547154736193  |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.088 seconds)
{code}

Can no longer use the index with just v1. It's doing a full scan!

{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1 and 
pk1 < 10000;
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                PLAN                           
                     | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER 
TEST [1,*] - [1,10000]  | 314572800       | 5518821        | 1547154736193  |
|     SERVER FILTER BY FIRST KEY ONLY AND TO_FLOAT("V1") = 0.1                  
                     | 314572800       | 5518821        | 1547154736193  |
|     SERVER AGGREGATE INTO SINGLE ROW                                          
                     | 314572800       | 5518821        | 1547154736193  |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.077 seconds)
{code}

Now it's doing a full scan over the first region of the main table, still not 
using the local index. (notice the filter...)

What I expect is that this uses the local index, but only only consults the 
regions that can possibly have any data based on the selection on the pk prefix 
(pk1 here)

{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1 and 
pk1 = 1;
+--------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                            PLAN                               
             | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+--------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER 
TEST [1,1,0.1]  | 314572800       | 5518821        | 1547154736193  |
|     SERVER FILTER BY FIRST KEY ONLY                                           
             | 314572800       | 5518821        | 1547154736193  |
|     SERVER AGGREGATE INTO SINGLE ROW                                          
             | 314572800       | 5518821        | 1547154736193  |
+--------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.052 seconds)
{code}

Only this worked (and again, only if pk1 is included in the index)


was (Author: lhofhansl):
Ok, so let's try:

create table test (pk1 integer not null, pk2 integer not null, pk3 integer not 
null, v1 float, v2 float, v3 integer CONSTRAINT pk PRIMARY KEY (pk1, pk2, pk3));
create local index l1 *on test(pk1, v1)*;

(Same table, but the index defined on (pk1, v1)

{code}
explain select count(*) from test;
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                         PLAN                                  
       | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER 
TEST [1]  | 314572800       | 5518821        | 1547154736193  |
|     SERVER FILTER BY FIRST KEY ONLY                                           
       | 314572800       | 5518821        | 1547154736193  |
|     SERVER AGGREGATE INTO SINGLE ROW                                          
       | 314572800       | 5518821        | 1547154736193  |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.113 seconds)
{code}

Again, 4 regions.

{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where pk1 < 10000;
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                PLAN                           
                     | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER 
TEST [1,*] - [1,10000]  | 314572800       | 5518821        | 1547154736193  |
|     SERVER FILTER BY FIRST KEY ONLY                                           
                     | 314572800       | 5518821        | 1547154736193  |
|     SERVER AGGREGATE INTO SINGLE ROW                                          
                     | 314572800       | 5518821        | 1547154736193  |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.072 seconds)
{code}

Again, all pks starting with pk1 < 10000 are in the same region.

{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1;
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                         PLAN                                  
       | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER 
TEST [1]  | 314572800       | 5518821        | 1547154736193  |
|     SERVER FILTER BY FIRST KEY ONLY AND TO_FLOAT("V1") = 0.1                  
       | 314572800       | 5518821        | 1547154736193  |
|     SERVER AGGREGATE INTO SINGLE ROW                                          
       | 314572800       | 5518821        | 1547154736193  |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.088 seconds)
{code}

Can no longer use the index with just v1. It's doing a full scan!

{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1 and 
pk1 < 10000;
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                PLAN                           
                     | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER 
TEST [1,*] - [1,10000]  | 314572800       | 5518821        | 1547154736193  |
|     SERVER FILTER BY FIRST KEY ONLY AND TO_FLOAT("V1") = 0.1                  
                     | 314572800       | 5518821        | 1547154736193  |
|     SERVER AGGREGATE INTO SINGLE ROW                                          
                     | 314572800       | 5518821        | 1547154736193  |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.077 seconds)
{code}

Now it's doing a full scan over the first region of the main table, still not 
using the local index. (notice the filter...)

What I expect is that this uses the local index, but only only consults the 
regions that can possibly have any data based on the selection on the pk prefix 
(pk1 here)


> Local index region pruning is not working as expected.
> ------------------------------------------------------
>
>                 Key: PHOENIX-5096
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5096
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Major
>
> The pruning of local indexes should do the following:
> * Use the local index
> * Reduce the number of regions based on filters on the table's primary key 
> (i.e. WHERE conditions including prefixes of the primary key)
> Instead it looks like in order for this to work the needed PK column need to 
> be included in the local index itself, changing the sort order of the local 
> index.
> I'll provide some examples in the comments.
> [~giacomotaylor], [~tdsilva], [~vincentpoon], [~elserj]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (PHOENIX-5096) Local index region pruning is not working as expected.

Reply via email to