[ 
https://issues.apache.org/jira/browse/CASSANDRA-18167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18030131#comment-18030131
 ] 

Caleb Rackliffe commented on CASSANDRA-18167:
---------------------------------------------

In some ways CASSANDRA-19497 has already solved this problem, as it batches 
reads within a partition. After CASSANDRA-18673, we also optimize a bit for the 
case where we don't have clustering keys. What remains here? If there really 
isn't much, we could close this. My guess is that if something is left, it 
would be finding cases with very small partitions, say less than 100 rows, and 
not storing row ID <-> clustering key mapping information, since the row ID <-> 
partition key mapping is probably good enough. In other words, extend some of 
the optimizations in CASSANDRA-18673 to work around a threshold partition size, 
rather than just whether clustering keys exist on the schema.

> Bypass row-awareness for small partitions
> -----------------------------------------
>
>                 Key: CASSANDRA-18167
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18167
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Feature/SAI
>            Reporter: Mike Adamson
>            Priority: Normal
>              Labels: SAI
>
> SAI supports row-awareness in that it indexes both the partition key and the 
> clustering key of a row. This improves query performance significantly for 
> wide partitions with many rows but it can impact performance for small 
> partitions where it could make sense to bypass row-awareness post-filter the 
> results (read the whole partition) or batch rows for a single partition.
> However this is achieved it would be necessary for the index to have an idea 
> of the size of the partition being read and to be aware of whether reading 
> the whole partition is likely to improve read performance.  
> SAI is aware of partition sizes during indexing so one option would be feed 
> these sizes into a histogram in the index metadata and apply a set of rules 
> to this metadata to decide whether we should attempt any optimisation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to