[PR] PHOENIX-7489 Add all partition ids internally to optimize full CDC Index scan queries [phoenix]

via GitHub Fri, 31 Jan 2025 18:00:45 -0800


virajjasani opened a new pull request, #2070:
URL: https://github.com/apache/phoenix/pull/2070


   Jira: PHOENIX-7489
   
   Since [PHOENIX-7425](https://issues.apache.org/jira/browse/PHOENIX-7425) 
introduced partitioned CDC Index to eliminate salting, it is important to 
include PARTITION_ID() in addition to PHOENIX_ROW_TIMESTAMP() with the WHERE 
clause of the CDC query. Before 
[PHOENIX-7425](https://issues.apache.org/jira/browse/PHOENIX-7425), providing 
only PHOENIX_ROW_TIMESTAMP() was sufficient as it was the rowkey prefix of the 
CDC Index table. However, that is not the case anymore.
   
   If the user only provides PHOENIX_ROW_TIMESTAMP() with the WHERE clause, it 
would result into the full table scan over the CDC Index. By providing both 
PARTITION_ID() and PHOENIX_ROW_TIMESTAMP(), it results into the range scan.
   
   Not all the clients might be aware of all unique partition ids present in 
the CDC Index. Hence, even if a client only provides the timestamp range with 
the CDC query, the list of partition ids should be internally retrieved and 
used alongside the timestamp range for the efficient range scan performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] PHOENIX-7489 Add all partition ids internally to optimize full CDC Index scan queries [phoenix]

Reply via email to