[jira] [Updated] (PHOENIX-5200) LIMIT+OFFSET works incorrectly when querying by index fields only and using DISTINCT

Artur Tamazian (JIRA) Fri, 15 Mar 2019 12:41:17 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Artur Tamazian updated PHOENIX-5200:
------------------------------------
    Description: 
Example:
{code:java}
drop table test

create table test (
pk integer not null primary key,
index_column1 integer,
index_column2 integer,
col3 integer
);

create index if not exists test_index on WATSON_TEXT_MESSAGE.test 
(index_column1, index_column2) include (col3);

upsert into test (pk, index_column1, index_column2, col3) values (1, 1, 1, 1);
upsert into test (pk, index_column1, index_column2, col3) values (2, 2, 2, 2);
upsert into test (pk, index_column1, index_column2, col3) values (3, 3, 3, 3);
upsert into test (pk, index_column1, index_column2, col3) values (4, 4, 4, 4);

-- this case works correctly

-- index is used but filtering by non indexed column too
-- returns two values: 2 and 3

select distinct index_column1 from test where index_column1 > 0 and 
index_column2 > 0 and col3 > 0 limit 2 offset 1


-- this query should return the same, but it doesn't. The only difference is: 
non indexed column is not used in where clause
-- returns one value: 2

select distinct index_column1 from test where index_column1 > 0 and 
index_column2 > 0 limit 2 offset 1

-- The way to make the above work is to use limit = desired limit + offset, but 
that needs to be fixed{code}
If index is removed both queries return correct results.

Execution plans show what the problem is. Correct query plan:
{code:java}
CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TEST
SERVER FILTER BY (INDEX_COLUMN1 > 0 AND INDEX_COLUMN2 > 0 AND COL3 > 0)
SERVER AGGREGATE INTO DISTINCT ROWS BY [INDEX_COLUMN1] LIMIT 3 GROUPS
CLIENT MERGE SORT
CLIENT OFFSET 1
CLIENT 2 ROW LIMIT

{code}
Incorrect query plan:
{code:java}
CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER    TEST_INDEX [0] - [*]
SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("INDEX_COLUMN2") > 0
SERVER DISTINCT PREFIX FILTER OVER ["INDEX_COLUMN1"]
SERVER 2 ROW LIMIT
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["INDEX_COLUMN1"] LIMIT 3 GROUPS
CLIENT OFFSET 1
CLIENT 2 ROW LIMIT

{code}

  was:
Example:

{code}

drop table test

create table test (
pk integer not null primary key,
index_column1 integer,
index_column2 integer,
col3 integer
);

create index if not exists test_index on WATSON_TEXT_MESSAGE.test 
(index_column1, index_column2) include (col3);

upsert into test (pk, index_column1, index_column2, col3) values (1, 1, 1, 1);
upsert into test (pk, index_column1, index_column2, col3) values (2, 2, 2, 2);
upsert into test (pk, index_column1, index_column2, col3) values (3, 3, 3, 3);
upsert into test (pk, index_column1, index_column2, col3) values (4, 4, 4, 4);

-- this case works correctly

-- index is used but filtering by non indexed column too
-- returns two values: 2 and 3

select distinct index_column1 from test where index_column1 > 0 and 
index_column2 > 0 and col3 > 0 limit 2 offset 1


-- this query should return the same, but it doesn't. The only difference is: 
non indexed column is not used in where clause

-- returns one value: 2

select distinct index_column1 from test where index_column1 > 0 and 
index_column2 > 0 limit 2 offset 1

{code}



If index is removed both queries return correct results.

Execution plans show what the problem is. Correct query plan:



{code}

CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TEST
SERVER FILTER BY (INDEX_COLUMN1 > 0 AND INDEX_COLUMN2 > 0 AND COL3 > 0)
SERVER AGGREGATE INTO DISTINCT ROWS BY [INDEX_COLUMN1] LIMIT 3 GROUPS
CLIENT MERGE SORT
CLIENT OFFSET 1
CLIENT 2 ROW LIMIT

{code}



Incorrect query plan:

{code}

CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER    TEST_INDEX [0] - [*]
SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("INDEX_COLUMN2") > 0
SERVER DISTINCT PREFIX FILTER OVER ["INDEX_COLUMN1"]
SERVER 2 ROW LIMIT
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["INDEX_COLUMN1"] LIMIT 3 GROUPS
CLIENT OFFSET 1
CLIENT 2 ROW LIMIT

{code}




> LIMIT+OFFSET works incorrectly when querying by index fields only and using 
> DISTINCT
> ------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5200
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5200
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.13.2-cdh
>            Reporter: Artur Tamazian
>            Priority: Major
>
> Example:
> {code:java}
> drop table test
> create table test (
> pk integer not null primary key,
> index_column1 integer,
> index_column2 integer,
> col3 integer
> );
> create index if not exists test_index on WATSON_TEXT_MESSAGE.test 
> (index_column1, index_column2) include (col3);
> upsert into test (pk, index_column1, index_column2, col3) values (1, 1, 1, 1);
> upsert into test (pk, index_column1, index_column2, col3) values (2, 2, 2, 2);
> upsert into test (pk, index_column1, index_column2, col3) values (3, 3, 3, 3);
> upsert into test (pk, index_column1, index_column2, col3) values (4, 4, 4, 4);
> -- this case works correctly
> -- index is used but filtering by non indexed column too
> -- returns two values: 2 and 3
> select distinct index_column1 from test where index_column1 > 0 and 
> index_column2 > 0 and col3 > 0 limit 2 offset 1
> -- this query should return the same, but it doesn't. The only difference is: 
> non indexed column is not used in where clause
> -- returns one value: 2
> select distinct index_column1 from test where index_column1 > 0 and 
> index_column2 > 0 limit 2 offset 1
> -- The way to make the above work is to use limit = desired limit + offset, 
> but that needs to be fixed{code}
> If index is removed both queries return correct results.
> Execution plans show what the problem is. Correct query plan:
> {code:java}
> CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TEST
> SERVER FILTER BY (INDEX_COLUMN1 > 0 AND INDEX_COLUMN2 > 0 AND COL3 > 0)
> SERVER AGGREGATE INTO DISTINCT ROWS BY [INDEX_COLUMN1] LIMIT 3 GROUPS
> CLIENT MERGE SORT
> CLIENT OFFSET 1
> CLIENT 2 ROW LIMIT
> {code}
> Incorrect query plan:
> {code:java}
> CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER    TEST_INDEX [0] - [*]
> SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("INDEX_COLUMN2") > 0
> SERVER DISTINCT PREFIX FILTER OVER ["INDEX_COLUMN1"]
> SERVER 2 ROW LIMIT
> SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["INDEX_COLUMN1"] LIMIT 3 
> GROUPS
> CLIENT OFFSET 1
> CLIENT 2 ROW LIMIT
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-5200) LIMIT+OFFSET works incorrectly when querying by index fields only and using DISTINCT

Reply via email to