[
https://issues.apache.org/jira/browse/PHOENIX-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739782#comment-16739782
]
Lars Hofhansl edited comment on PHOENIX-5096 at 1/10/19 9:20 PM:
-----------------------------------------------------------------
Ok, so let's try:
create table test (pk1 integer not null, pk2 integer not null, pk3 integer not
null, v1 float, v2 float, v3 integer CONSTRAINT pk PRIMARY KEY (pk1, pk2, pk3));
create local index l1 *on test(pk1, v1)*;
(Same table, but the index defined on (pk1, v1)
{code}
explain select count(*) from test;
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER
TEST [1] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.113 seconds)
{code}
Again, 4 regions.
{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where pk1 < 10000;
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER
TEST [1,*] - [1,10000] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.072 seconds)
{code}
Again, all pks starting with pk1 < 10000 are in the same region.
{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1;
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER
TEST [1] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY AND TO_FLOAT("V1") = 0.1
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.088 seconds)
{code}
Can no longer use the index with just v1. It's doing a full scan!
{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1 and
pk1 < 10000;
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER
TEST [1,*] - [1,10000] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY AND TO_FLOAT("V1") = 0.1
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.077 seconds)
{code}
Now it's doing a full scan over the first region of the main table, still not
using the local index. (notice the filter...)
What I expect is that this uses the local index, but only only consults the
regions that can possibly have any data based on the selection on the pk prefix
(pk1 here)
was (Author: lhofhansl):
Ok, so let's try:
create table test (pk1 integer not null, pk2 integer not null, pk3 integer not
null, v1 float, v2 float, v3 integer CONSTRAINT pk PRIMARY KEY (pk1, pk2, pk3));
create local index l1 *on test(pk1, v1)*;
(Same table, but the index defined on (pk1, v1)
{code}
explain select count(*) from test;
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER
TEST [1] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.113 seconds)
{code}
Again, 4 regions.
{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where pk1 < 10000;
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER
TEST [1,*] - [1,10000] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.072 seconds)
{code}
Again, all pks starting with pk1 < 10000 are in the same region.
{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1;
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER
TEST [1] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY AND TO_FLOAT("V1") = 0.1
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.088 seconds)
{code}
Can no longer use the index with just v1. It's doing a full scan!
{code}
0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1 and
pk1 < 10000;
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN
| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER
TEST [1,*] - [1,10000] | 314572800 | 5518821 | 1547154736193 |
| SERVER FILTER BY FIRST KEY ONLY AND TO_FLOAT("V1") = 0.1
| 314572800 | 5518821 | 1547154736193 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 5518821 | 1547154736193 |
+----------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.077 seconds)
{code}
Now it's doing a full scan over the first region, still not using the local
index. (notice the filter...)
What I expect is that this uses the local index, but only only consults the
regions that can possibly have any data based on the selection on the pk prefix
(pk1 here)
> Local index region pruning is not working as expected.
> ------------------------------------------------------
>
> Key: PHOENIX-5096
> URL: https://issues.apache.org/jira/browse/PHOENIX-5096
> Project: Phoenix
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Priority: Major
>
> The pruning of local indexes should do the following:
> * Use the local index
> * Reduce the number of regions based on filters on the table's primary (i.e.
> WHERE conditions including prefixes of the primary key)
> Instead it looks like in order for this to work the needed PK column need to
> be included in the local index itself, changing the sort order of the local
> index.
> I'll provide some examples in the comments.
> [~giacomotaylor], [~tdsilva], [~vincentpoon], [~elserj]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)