Joel Palmert created PHOENIX-3438:
-------------------------------------
Summary: Phoenix should be able to optimize LIMUT query with IN
clause
Key: PHOENIX-3438
URL: https://issues.apache.org/jira/browse/PHOENIX-3438
Project: Phoenix
Issue Type: Bug
Affects Versions: 4.8.0
Reporter: Joel Palmert
This is taken from a real production issue we're having but has been simplified
here. Steps to repro:
CREATE TABLE IF NOT EXISTS TEST.TEST (
ORGANIZATION_ID CHAR(15) NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID CHAR(15)
CONSTRAINT TEST_PK PRIMARY KEY (
ORGANIZATION_ID,
SCORE DESC
)
) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=43200
EXPLAIN
SELECT entity_id, MAX(score) max_score
FROM test.test
WHERE organization_id IN ('org1','org3')
GROUP BY entity_id
ORDER BY max_score DESC
LIMIT 1
CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 KEYS OVER TEST.TEST ['org1
'] - ['org2 ']
SERVER AGGREGATE INTO DISTINCT ROWS BY [ENTITY_ID]
CLIENT MERGE SORT
CLIENT TOP 1 ROW SORTED BY [MAX(SCORE) DESC]
Notice that Phoenix gets and sends all the rows for the two organizations to
the client. Since the ORDER BY is in PK order I would expect it to just
1. Get the first row for each organization (in index order) rather than
scanning all the rows.
2. Merge them (only 2 rows in this case)
3. Limit 1
Consider the following query and plan for what I would expect to see:
EXPLAIN
SELECT entity_id, MAX(score) max_score
FROM(
SELECT entity_id, score
FROM(
SELECT entity_id, score
FROM test.test
WHERE organization_id = 'org1'
ORDER BY score DESC
LIMIT 1)
UNION ALL
SELECT entity_id, score
FROM(
SELECT entity_id, score
FROM test.test
WHERE organization_id = 'org2'
ORDER BY score DESC
LIMIT 1))
GROUP BY entity_id
ORDER BY max_score DESC
LIMIT 1
UNION ALL OVER 2 QUERIES
CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER TEST.TEST ['org1 ']
SERVER 1 ROW LIMIT
CLIENT 1 ROW LIMIT
CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER TEST.TEST ['org2 ']
SERVER 1 ROW LIMIT
CLIENT 1 ROW LIMIT
CLIENT SORTED BY [ENTITY_ID]
CLIENT AGGREGATE INTO DISTINCT ROWS BY [ENTITY_ID]
CLIENT TOP 1 ROW SORTED BY [MAX(SCORE) DESC]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)