[
https://issues.apache.org/jira/browse/PHOENIX-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Artur Tamazian updated PHOENIX-5200:
------------------------------------
Description:
Example:
{code:java}
drop table test
create table test (
pk integer not null primary key,
index_column1 integer,
index_column2 integer,
col3 integer
);
create index if not exists test_index on WATSON_TEXT_MESSAGE.test
(index_column1, index_column2) include (col3);
upsert into test (pk, index_column1, index_column2, col3) values (1, 1, 1, 1);
upsert into test (pk, index_column1, index_column2, col3) values (2, 2, 2, 2);
upsert into test (pk, index_column1, index_column2, col3) values (3, 3, 3, 3);
upsert into test (pk, index_column1, index_column2, col3) values (4, 4, 4, 4);
-- this case works correctly
-- index is used but filtering by non indexed column too
-- returns two values: 2 and 3
select distinct index_column1 from test where index_column1 > 0 and
index_column2 > 0 and col3 > 0 limit 2 offset 1
-- this query should return the same, but it doesn't. The only difference is:
non indexed column is not used in where clause
-- returns one value: 2
select distinct index_column1 from test where index_column1 > 0 and
index_column2 > 0 limit 2 offset 1
-- The way to make the above work is to use limit = desired limit + offset, but
that needs to be fixed{code}
If index is removed both queries return correct results.
Execution plans show what the problem is. Correct query plan:
{code:java}
CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TEST
SERVER FILTER BY (INDEX_COLUMN1 > 0 AND INDEX_COLUMN2 > 0 AND COL3 > 0)
SERVER AGGREGATE INTO DISTINCT ROWS BY [INDEX_COLUMN1] LIMIT 3 GROUPS
CLIENT MERGE SORT
CLIENT OFFSET 1
CLIENT 2 ROW LIMIT
{code}
Incorrect query plan:
{code:java}
CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER TEST_INDEX [0] - [*]
SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("INDEX_COLUMN2") > 0
SERVER DISTINCT PREFIX FILTER OVER ["INDEX_COLUMN1"]
SERVER 2 ROW LIMIT
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["INDEX_COLUMN1"] LIMIT 3 GROUPS
CLIENT OFFSET 1
CLIENT 2 ROW LIMIT
{code}
was:
Example:
{code}
drop table test
create table test (
pk integer not null primary key,
index_column1 integer,
index_column2 integer,
col3 integer
);
create index if not exists test_index on WATSON_TEXT_MESSAGE.test
(index_column1, index_column2) include (col3);
upsert into test (pk, index_column1, index_column2, col3) values (1, 1, 1, 1);
upsert into test (pk, index_column1, index_column2, col3) values (2, 2, 2, 2);
upsert into test (pk, index_column1, index_column2, col3) values (3, 3, 3, 3);
upsert into test (pk, index_column1, index_column2, col3) values (4, 4, 4, 4);
-- this case works correctly
-- index is used but filtering by non indexed column too
-- returns two values: 2 and 3
select distinct index_column1 from test where index_column1 > 0 and
index_column2 > 0 and col3 > 0 limit 2 offset 1
-- this query should return the same, but it doesn't. The only difference is:
non indexed column is not used in where clause
-- returns one value: 2
select distinct index_column1 from test where index_column1 > 0 and
index_column2 > 0 limit 2 offset 1
{code}
If index is removed both queries return correct results.
Execution plans show what the problem is. Correct query plan:
{code}
CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TEST
SERVER FILTER BY (INDEX_COLUMN1 > 0 AND INDEX_COLUMN2 > 0 AND COL3 > 0)
SERVER AGGREGATE INTO DISTINCT ROWS BY [INDEX_COLUMN1] LIMIT 3 GROUPS
CLIENT MERGE SORT
CLIENT OFFSET 1
CLIENT 2 ROW LIMIT
{code}
Incorrect query plan:
{code}
CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER TEST_INDEX [0] - [*]
SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("INDEX_COLUMN2") > 0
SERVER DISTINCT PREFIX FILTER OVER ["INDEX_COLUMN1"]
SERVER 2 ROW LIMIT
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["INDEX_COLUMN1"] LIMIT 3 GROUPS
CLIENT OFFSET 1
CLIENT 2 ROW LIMIT
{code}
> LIMIT+OFFSET works incorrectly when querying by index fields only and using
> DISTINCT
> ------------------------------------------------------------------------------------
>
> Key: PHOENIX-5200
> URL: https://issues.apache.org/jira/browse/PHOENIX-5200
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.13.2-cdh
> Reporter: Artur Tamazian
> Priority: Major
>
> Example:
> {code:java}
> drop table test
> create table test (
> pk integer not null primary key,
> index_column1 integer,
> index_column2 integer,
> col3 integer
> );
> create index if not exists test_index on WATSON_TEXT_MESSAGE.test
> (index_column1, index_column2) include (col3);
> upsert into test (pk, index_column1, index_column2, col3) values (1, 1, 1, 1);
> upsert into test (pk, index_column1, index_column2, col3) values (2, 2, 2, 2);
> upsert into test (pk, index_column1, index_column2, col3) values (3, 3, 3, 3);
> upsert into test (pk, index_column1, index_column2, col3) values (4, 4, 4, 4);
> -- this case works correctly
> -- index is used but filtering by non indexed column too
> -- returns two values: 2 and 3
> select distinct index_column1 from test where index_column1 > 0 and
> index_column2 > 0 and col3 > 0 limit 2 offset 1
> -- this query should return the same, but it doesn't. The only difference is:
> non indexed column is not used in where clause
> -- returns one value: 2
> select distinct index_column1 from test where index_column1 > 0 and
> index_column2 > 0 limit 2 offset 1
> -- The way to make the above work is to use limit = desired limit + offset,
> but that needs to be fixed{code}
> If index is removed both queries return correct results.
> Execution plans show what the problem is. Correct query plan:
> {code:java}
> CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TEST
> SERVER FILTER BY (INDEX_COLUMN1 > 0 AND INDEX_COLUMN2 > 0 AND COL3 > 0)
> SERVER AGGREGATE INTO DISTINCT ROWS BY [INDEX_COLUMN1] LIMIT 3 GROUPS
> CLIENT MERGE SORT
> CLIENT OFFSET 1
> CLIENT 2 ROW LIMIT
> {code}
> Incorrect query plan:
> {code:java}
> CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER TEST_INDEX [0] - [*]
> SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("INDEX_COLUMN2") > 0
> SERVER DISTINCT PREFIX FILTER OVER ["INDEX_COLUMN1"]
> SERVER 2 ROW LIMIT
> SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["INDEX_COLUMN1"] LIMIT 3
> GROUPS
> CLIENT OFFSET 1
> CLIENT 2 ROW LIMIT
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)