[
https://issues.apache.org/jira/browse/PHOENIX-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739782#comment-16739782
]
Lars Hofhansl edited comment on PHOENIX-5096 at 1/10/19 9:38 PM:
-----------------------------------------------------------------
Ok, so let's try:
create table test (pk1 integer not null, pk2 integer not null, pk3 integer not
null, v1 float, v2 float, v3 integer CONSTRAINT pk PRIMARY KEY (pk1, pk2, pk3));
create local index l1 *on test(pk1, v1)*;
(Same table, but the index defined on (pk1, v1)
{code}
explain select count(*) from test;
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER
TEST [1] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.113 seconds)
{code}
Again, 4 regions.
{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where pk1 < 10000;
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER
TEST [1,*] - [1,10000] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.072 seconds)
{code}
Again, all pks starting with pk1 < 10000 are in the same region.
{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1;
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER
TEST [1] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY AND TO_FLOAT("V1") = 0.1
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.088 seconds)
{code}
Can no longer use the index with just v1. It's doing a full scan!
{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1 and
pk1 < 10000;
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER
TEST [1,*] - [1,10000] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY AND TO_FLOAT("V1") = 0.1
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.077 seconds)
{code}
Now it's doing a full scan over the first region of the main table, still not
using the local index. (notice the filter...)
What I expect is that this uses the local index, but only only consults the
regions that can possibly have any data based on the selection on the pk prefix
(pk1 here)
{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1 and
pk1 = 1;
+--------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER
TEST [1,1,0.1] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+--------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.052 seconds)
{code}
Only this worked (and again, only if pk1 is included in the index)
was (Author: lhofhansl):
Ok, so let's try:
create table test (pk1 integer not null, pk2 integer not null, pk3 integer not
null, v1 float, v2 float, v3 integer CONSTRAINT pk PRIMARY KEY (pk1, pk2, pk3));
create local index l1 *on test(pk1, v1)*;
(Same table, but the index defined on (pk1, v1)
{code}
explain select count(*) from test;
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER
TEST [1] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.113 seconds)
{code}
Again, 4 regions.
{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where pk1 < 10000;
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER
TEST [1,*] - [1,10000] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.072 seconds)
{code}
Again, all pks starting with pk1 < 10000 are in the same region.
{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1;
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER
TEST [1] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY AND TO_FLOAT("V1") = 0.1
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.088 seconds)
{code}
Can no longer use the index with just v1. It's doing a full scan!
{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1 and
pk1 < 10000;
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER
TEST [1,*] - [1,10000] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY AND TO_FLOAT("V1") = 0.1
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.077 seconds)
{code}
Now it's doing a full scan over the first region of the main table, still not
using the local index. (notice the filter...)
What I expect is that this uses the local index, but only only consults the
regions that can possibly have any data based on the selection on the pk prefix
(pk1 here)
> Local index region pruning is not working as expected.
> ------------------------------------------------------
>
> Key: PHOENIX-5096
> URL: https://issues.apache.org/jira/browse/PHOENIX-5096
> Project: Phoenix
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Priority: Major
>
> The pruning of local indexes should do the following:
> * Use the local index
> * Reduce the number of regions based on filters on the table's primary key
> (i.e. WHERE conditions including prefixes of the primary key)
> Instead it looks like in order for this to work the needed PK column need to
> be included in the local index itself, changing the sort order of the local
> index.
> I'll provide some examples in the comments.
> [~giacomotaylor], [~tdsilva], [~vincentpoon], [~elserj]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)