Viraj Jasani created PHOENIX-7489:
-------------------------------------

             Summary: Add all partition ids internally if CDC query only 
includes timestamp range
                 Key: PHOENIX-7489
                 URL: https://issues.apache.org/jira/browse/PHOENIX-7489
             Project: Phoenix
          Issue Type: Sub-task
            Reporter: Viraj Jasani


Since PHOENIX-7425 introduced partitioned CDC Index to eliminate salting, it is 
important to include PARTITION_ID() in addition to PHOENIX_ROW_TIMESTAMP() with 
the WHERE clause of the CDC query. Before PHOENIX-7425, providing only 
PHOENIX_ROW_TIMESTAMP() was sufficient as it was the rowkey prefix of the CDC 
Index table. However, that is not the case anymore.

If the user only provides PHOENIX_ROW_TIMESTAMP() with the WHERE clause, it 
would result into the full table scan over the CDC Index. By providing both 
PARTITION_ID() and PHOENIX_ROW_TIMESTAMP(), it results into the range scan.

Not all the clients might be aware of all unique partition ids present in the 
CDC Index. Hence, even if a client only provides the timestamp range with the 
CDC query, the list of partition ids should be internally retrieved and used 
alongside the timestamp range for the efficient range scan performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to