[jira] [Commented] (CASSANDRA-18515) Optimize Initial Concurrency Selection for Range Read Algorithm During SAI Queries

Jira Thu, 06 Jul 2023 09:03:09 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740679#comment-17740679
 ]


Andres de la Peña commented on CASSANDRA-18515:
-----------------------------------------------

I think there isn't a run for j17, and the run for j11 doesn't include repeated 
runs of the new {{{}ConcurrencyFactorTest{}}}. I'm starting new runs for both 
things:
||PR||CI||
|[trunk|https://github.com/apache/cassandra/pull/2463]|[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/3008/workflows/737510be-0260-4e6b-9b31-335e4631099f]
 
[j17|https://app.circleci.com/pipelines/github/adelapena/cassandra/3008/workflows/32b31597-53e2-4551-8faa-7cd1809d79bf]|

+1 assuming those runs don't find any new failures.

[~maedhroz] are you going to review this one?

> Optimize Initial Concurrency Selection for Range Read Algorithm During SAI 
> Queries
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18515
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18515
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Feature/2i Index
>            Reporter: Mike Adamson
>            Assignee: Mike Adamson
>            Priority: Normal
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The range read algorithm relies on the Index API’s notion of estimated result 
> rows to decide how many replicas to contact in parallel during its first 
> round of requests. The more results expected from a replica for a token 
> range, the fewer replicas the range read will initially try to contact. Like 
> SASI, SAI floors that estimate to a huge negative number to make sure it’s 
> selected over other indexes, and this floors the concurrency factor to 1. The 
> actual formula looks like this:
> {code:java}
> // resultsPerRange, from SAI, is a giant negative number
> concurrencyFactor = Math.max(1, Math.min(ranges.rangeCount(), (int) 
> Math.ceil(command.limits().count() / resultsPerRange)));
> {code}
> Although that concurrency factor is updated as actual results stream in, only 
> sending a single range request to a single replica in every case for SAI is 
> not ideal. For example, assume I have a 3 node cluster and a keyspace at 
> RF=1, with 10 rows spread across the 3 nodes, without vnodes. Issuing a query 
> that matches all 10 rows with a LIMIT of 10 will make 2 or 3 serial range 
> requests from the coordinator, one to each of the 3 nodes.
> This can be fixed by allowing indexes to bypass the initial concurrency 
> calculation allowing SAI queries to contact the entire ring in a single round 
> of queries, or at worst the minimum number of rounds as bounded by the 
> existing statutory maximum ranges per round.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-18515) Optimize Initial Concurrency Selection for Range Read Algorithm During SAI Queries

Reply via email to