[jira] [Updated] (PHOENIX-6916) Cannot handle ranges where start is a prefix of end for desc columns

Istvan Toth (Jira) Sun, 23 Apr 2023 22:18:04 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-6916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Istvan Toth updated PHOENIX-6916:
---------------------------------
    Description: 
For some non-empty timestamp ranges, Phoenix generates a query plan with a 
degenerate scan if DESC index is created on the timestamp field:
{noformat}
create table data (id varchar primary key, ts timestamp);
create index data_idx on data(ts desc);

0: jdbc:phoenix:> explain select id, ts from data where ts >= TIMESTAMP 
'2023-02-23 13:30:00'  and ts < TIMESTAMP '2023-02-23 13:40:00'; 
+-------------------------------+----------------+---------------+-------------+
|             PLAN              | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-------------------------------+----------------+---------------+-------------+
| DEGENERATE SCAN OVER DATA_IDX | null           | null          | null        |
+-------------------------------+----------------+---------------+-------------+
1 row selected (0.012 seconds)

0: jdbc:phoenix:> explain select id, ts from data where ts >= TIMESTAMP 
'2023-02-23 13:30:00'  and ts < TIMESTAMP '2023-02-23 13:50:00';
+----------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
|                                                      PLAN                     
                                 | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS 
|
+----------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER DATA_IDX 
[~1,677,160,200,000] - [~1,677,159,000,000] | null           | null          | 
null        |
|     SERVER FILTER BY FIRST KEY ONLY                                           
                                 | null           | null          | null        
|
+----------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
2 rows selected (0.009 seconds)

0: jdbc:phoenix:localhost:59231> explain select /*+NO_INDEX*/ id, ts from data 
where ts >= TIMESTAMP '2023-02-23 13:30:00'  and ts < TIMESTAMP '2023-02-23 
13:40:00';
+---------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
|                                                     PLAN                      
                                | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER DATA                 
                                | null           | null          | null        |
|     SERVER FILTER BY (TS >= TIMESTAMP '2023-02-23 13:30:00.000' AND TS < 
TIMESTAMP '2023-02-23 13:40:00.000') | null           | null          | null    
    |
+---------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
{noformat}
Actually, the problem is much more basic:
{noformat}
create table ascpk (k varchar primary key); 
create table descpk (k varchar primary key desc);
upsert into ascpk values ('a');
upsert into ascpk values ('aa');
upsert into ascpk values ('aaa');
upsert into ascpk values ('aaab');
//Same for descpk 
select * from ascpk;
+------+
|  K   |
+------+
| a    |
| aa   |
| aaa  |
| aaab |
+------+
4 rows selected (0.035 seconds)
select * from descpk;
+------+
|  K   |
+------+
| aaab |
| aaa  |
| aa   |
| a    |
+------+
select * from ascpk where k between 'a' and 'aaa';
+-----+
|  K  |
+-----+
| a   |
| aa  |
| aaa |
+-----+  
3 rows selected (0.026 seconds) select * from descpk where k between 'a' and 
'aaa';
+---+
| K |
+---+
+---+
No rows selected (0.022 seconds) {noformat}
The inversion logic for descending keys completely breaks down when we try to 
use it for keyranges where start key is a prefix of the end key.

  was:
For some non-empty timestamp ranges, Phoenix generates a query plan with a 
degenerate scan if DESC index is created on the timestamp field:
{noformat}
create table data (id varchar primary key, ts timestamp);
create index data_idx on data(ts desc);

0: jdbc:phoenix:> explain select id, ts from data where ts >= TIMESTAMP 
'2023-02-23 13:30:00'  and ts < TIMESTAMP '2023-02-23 13:40:00'; 
+-------------------------------+----------------+---------------+-------------+
|             PLAN              | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-------------------------------+----------------+---------------+-------------+
| DEGENERATE SCAN OVER DATA_IDX | null           | null          | null        |
+-------------------------------+----------------+---------------+-------------+
1 row selected (0.012 seconds)

0: jdbc:phoenix:> explain select id, ts from data where ts >= TIMESTAMP 
'2023-02-23 13:30:00'  and ts < TIMESTAMP '2023-02-23 13:50:00';
+----------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
|                                                      PLAN                     
                                 | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS 
|
+----------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER DATA_IDX 
[~1,677,160,200,000] - [~1,677,159,000,000] | null           | null          | 
null        |
|     SERVER FILTER BY FIRST KEY ONLY                                           
                                 | null           | null          | null        
|
+----------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
2 rows selected (0.009 seconds)

0: jdbc:phoenix:localhost:59231> explain select /*+NO_INDEX*/ id, ts from data 
where ts >= TIMESTAMP '2023-02-23 13:30:00'  and ts < TIMESTAMP '2023-02-23 
13:40:00';
+---------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
|                                                     PLAN                      
                                | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER DATA                 
                                | null           | null          | null        |
|     SERVER FILTER BY (TS >= TIMESTAMP '2023-02-23 13:30:00.000' AND TS < 
TIMESTAMP '2023-02-23 13:40:00.000') | null           | null          | null    
    |
+---------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
{noformat}
Actually, the problem is much more basic:
{noformat}
create table ascpk (k varchar primary key); 
create table descpk (k varchar primary key desc);
upsert into ascpk values ('a');
upsert into ascpk values ('aa');
upsert into ascpk values ('aaa');
upsert into ascpk values ('aaab');
//Same for descpk 
select * from ascpk;
+------+
|  K   |
+------+
| a    |
| aa   |
| aaa  |
| aaab |
+------+
4 rows selected (0.035 seconds)
select * from descpk;
+------+
|  K   |
+------+
| aaab |
| aaa  |
| aa   |
| a    |
+------+
select * from ascpk where k between 'a' and 'aaa';
+-----+
|  K  |
+-----+
| a   |
| aa  |
| aaa |
+-----+  
3 rows selected (0.026 seconds) select * from descpk where k between 'a' and 
'aaa';
+---+
| K |
+---+
+---+
No rows selected (0.022 seconds) {noformat}
The inversion logic for descending keys completely breaks down when we try to 
use it for keyranges where end key is a prefix of the start key.


> Cannot handle ranges where start is a prefix of end for desc columns
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-6916
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6916
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Istvan Toth
>            Assignee: Istvan Toth
>            Priority: Critical
>
> For some non-empty timestamp ranges, Phoenix generates a query plan with a 
> degenerate scan if DESC index is created on the timestamp field:
> {noformat}
> create table data (id varchar primary key, ts timestamp);
> create index data_idx on data(ts desc);
> 0: jdbc:phoenix:> explain select id, ts from data where ts >= TIMESTAMP 
> '2023-02-23 13:30:00'  and ts < TIMESTAMP '2023-02-23 13:40:00'; 
> +-------------------------------+----------------+---------------+-------------+
> |             PLAN              | EST_BYTES_READ | EST_ROWS_READ | 
> EST_INFO_TS |
> +-------------------------------+----------------+---------------+-------------+
> | DEGENERATE SCAN OVER DATA_IDX | null           | null          | null       
>  |
> +-------------------------------+----------------+---------------+-------------+
> 1 row selected (0.012 seconds)
> 0: jdbc:phoenix:> explain select id, ts from data where ts >= TIMESTAMP 
> '2023-02-23 13:30:00'  and ts < TIMESTAMP '2023-02-23 13:50:00';
> +----------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
> |                                                      PLAN                   
>                                    | EST_BYTES_READ | EST_ROWS_READ | 
> EST_INFO_TS |
> +----------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER DATA_IDX 
> [~1,677,160,200,000] - [~1,677,159,000,000] | null           | null          
> | null        |
> |     SERVER FILTER BY FIRST KEY ONLY                                         
>                                    | null           | null          | null    
>     |
> +----------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
> 2 rows selected (0.009 seconds)
> 0: jdbc:phoenix:localhost:59231> explain select /*+NO_INDEX*/ id, ts from 
> data where ts >= TIMESTAMP '2023-02-23 13:30:00'  and ts < TIMESTAMP 
> '2023-02-23 13:40:00';
> +---------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
> |                                                     PLAN                    
>                                   | EST_BYTES_READ | EST_ROWS_READ | 
> EST_INFO_TS |
> +---------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER DATA               
>                                   | null           | null          | null     
>    |
> |     SERVER FILTER BY (TS >= TIMESTAMP '2023-02-23 13:30:00.000' AND TS < 
> TIMESTAMP '2023-02-23 13:40:00.000') | null           | null          | null  
>       |
> +---------------------------------------------------------------------------------------------------------------+----------------+---------------+-------------+
> {noformat}
> Actually, the problem is much more basic:
> {noformat}
> create table ascpk (k varchar primary key); 
> create table descpk (k varchar primary key desc);
> upsert into ascpk values ('a');
> upsert into ascpk values ('aa');
> upsert into ascpk values ('aaa');
> upsert into ascpk values ('aaab');
> //Same for descpk 
> select * from ascpk;
> +------+
> |  K   |
> +------+
> | a    |
> | aa   |
> | aaa  |
> | aaab |
> +------+
> 4 rows selected (0.035 seconds)
> select * from descpk;
> +------+
> |  K   |
> +------+
> | aaab |
> | aaa  |
> | aa   |
> | a    |
> +------+
> select * from ascpk where k between 'a' and 'aaa';
> +-----+
> |  K  |
> +-----+
> | a   |
> | aa  |
> | aaa |
> +-----+  
> 3 rows selected (0.026 seconds) select * from descpk where k between 'a' and 
> 'aaa';
> +---+
> | K |
> +---+
> +---+
> No rows selected (0.022 seconds) {noformat}
> The inversion logic for descending keys completely breaks down when we try to 
> use it for keyranges where start key is a prefix of the end key.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-6916) Cannot handle ranges where start is a prefix of end for desc columns

Reply via email to