[ 
https://issues.apache.org/jira/browse/PHOENIX-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated PHOENIX-7489:
----------------------------------
    Summary: Add all partition ids internally to optimize full scan CDC queries 
 (was: Add all partition ids internally if CDC query only includes timestamp 
range)

> Add all partition ids internally to optimize full scan CDC queries
> ------------------------------------------------------------------
>
>                 Key: PHOENIX-7489
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7489
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>
> Since PHOENIX-7425 introduced partitioned CDC Index to eliminate salting, it 
> is important to include PARTITION_ID() in addition to PHOENIX_ROW_TIMESTAMP() 
> with the WHERE clause of the CDC query. Before PHOENIX-7425, providing only 
> PHOENIX_ROW_TIMESTAMP() was sufficient as it was the rowkey prefix of the CDC 
> Index table. However, that is not the case anymore.
> If the user only provides PHOENIX_ROW_TIMESTAMP() with the WHERE clause, it 
> would result into the full table scan over the CDC Index. By providing both 
> PARTITION_ID() and PHOENIX_ROW_TIMESTAMP(), it results into the range scan.
> Not all the clients might be aware of all unique partition ids present in the 
> CDC Index. Hence, even if a client only provides the timestamp range with the 
> CDC query, the list of partition ids should be internally retrieved and used 
> alongside the timestamp range for the efficient range scan performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to