Joel Palmert created PHOENIX-3438:
-------------------------------------

             Summary: Phoenix should be able to optimize LIMUT query with IN 
clause
                 Key: PHOENIX-3438
                 URL: https://issues.apache.org/jira/browse/PHOENIX-3438
             Project: Phoenix
          Issue Type: Bug
    Affects Versions: 4.8.0
            Reporter: Joel Palmert


This is taken from a real production issue we're having but has been simplified 
here. Steps to repro:

CREATE TABLE IF NOT EXISTS TEST.TEST (
    ORGANIZATION_ID CHAR(15) NOT NULL,
    SCORE INTEGER NOT NULL,
    ENTITY_ID CHAR(15)
    CONSTRAINT TEST_PK PRIMARY KEY (
        ORGANIZATION_ID,
        SCORE DESC
    )
) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=43200

EXPLAIN
SELECT entity_id, MAX(score) max_score
FROM test.test
WHERE organization_id IN ('org1','org3')
GROUP BY entity_id
ORDER BY max_score DESC
LIMIT 1

CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 KEYS OVER TEST.TEST ['org1         
  '] - ['org2           ']
    SERVER AGGREGATE INTO DISTINCT ROWS BY [ENTITY_ID]
CLIENT MERGE SORT
CLIENT TOP 1 ROW SORTED BY [MAX(SCORE) DESC]

Notice that Phoenix gets and sends all the rows for the two organizations to 
the client. Since the ORDER BY is in PK order I would expect it to just 
1. Get the first row for each organization (in index order) rather than 
scanning all the rows.
2. Merge them (only 2 rows in this case)
3. Limit 1

Consider the following query and plan for what I would expect to see:
EXPLAIN
SELECT entity_id, MAX(score) max_score
FROM(
        SELECT entity_id, score
        FROM(
                SELECT entity_id, score
                FROM test.test
                WHERE organization_id = 'org1'
                ORDER BY score DESC
                LIMIT 1)
        UNION ALL
        SELECT entity_id, score
                FROM(
                SELECT entity_id, score
                FROM test.test
                WHERE organization_id = 'org2'
                ORDER BY score DESC
                LIMIT 1))
GROUP BY entity_id
ORDER BY max_score DESC
LIMIT 1

UNION ALL OVER 2 QUERIES
    CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER TEST.TEST ['org1           ']
        SERVER 1 ROW LIMIT
    CLIENT 1 ROW LIMIT
    CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER TEST.TEST ['org2           ']
        SERVER 1 ROW LIMIT
    CLIENT 1 ROW LIMIT
CLIENT SORTED BY [ENTITY_ID]
CLIENT AGGREGATE INTO DISTINCT ROWS BY [ENTITY_ID]
CLIENT TOP 1 ROW SORTED BY [MAX(SCORE) DESC]






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to