[jira] [Commented] (CASSANDRA-19497) ResultRetriever should batch clusterings/rows during SAI post-filtering reads

Caleb Rackliffe (Jira) Wed, 06 Nov 2024 13:12:07 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17896093#comment-17896093
 ]


Caleb Rackliffe commented on CASSANDRA-19497:
---------------------------------------------

Alright, so I went back and created a thing I'm calling 
{{InsertionOrderedNavigableSet}}. It's a very limited implementation of 
{{NavigableSet}} that assumes elements are added in-order and explodes if they 
aren't. This means insertion is constant time, and we avoid the {{TreeSet}} 
building overhead from the first cut of the patch. Did a final run of my little 
test:

|branch|p50 (nanos)|p99 (nanos)|
|trunk|9,031,679|9,058,303|
|patch w/ TreeSet|1,302,847|1,320,255|
|patch w/ InsertionOrderedNavigableSet |955,423|971,231|

So we're approaching a 10x improvement in latency for the CPU-bound part of a 
partition-restricted query with 1000 results on a partition of 10,000 rows.

> ResultRetriever should batch clusterings/rows during SAI post-filtering reads
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19497
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Feature/SAI
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 5.0.x, 5.x
>
>         Attachments: alloc-trunk.html, cpu-batch-100-19497.png, 
> cpu-trunk-19497.png, cpu-trunk.html, heap-flamegraph.html, 
> wall-no-parked-threads.html
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> SAI currently creates and executes a {{SinglePartitionReadCommand}} for every 
> {{PrimaryKey}} the index produces to read the corresponding row for 
> post-filtering. Informed by the limits present in the read command itself, it 
> should be possible to batch those reads w/ a {{ClusteringIndexNamesFilter}} 
> in many fewer {{SinglePartitionReadCommands}}. When we have a handful of 
> matches in a large partition, this seems like would involve many fewer seeks, 
> less unnecessary object creation, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-19497) ResultRetriever should batch clusterings/rows during SAI post-filtering reads

Reply via email to