[ 
https://issues.apache.org/jira/browse/CASSANDRA-17424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-17424:
----------------------------------------
    Attachment:     (was: 
bugreport-blackjack-QODS30.163-7-27-2022-03-13-13-48-43.png)

> Performance and Semantic Concerns w/ Metrics for Local vs. Remote Requests in 
> StorageProxy
> ------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17424
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17424
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Observability/Metrics
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 4.x
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> In CASSANDRA-10023, we added two new metrics to both {{ClientRequestMetrics}} 
> and {{ClientWriteRequestMetrics}} to represent requests where the driver 
> either does or does not make a correct token-aware choice of coordinator. 
> (Auditing driver behavior is listed as the primary goal of that Jira.)
> There are, however, a few concerns we should address before this releases in 
> 4.1:
> 1.) With paging enabled and a LIMIT < fetch size, {{IN}} queries can hit 
> {{fetchRows()}} multiple times, so the number of local + remote requests 
> isn’t the same as the number of queries marked in {{ClientRequestMetrics}} in 
> {{readRegular()}}.
> 2.) {{IN}} queries will potentially mark a bunch of “remote” requests even if 
> one key in the {{IN}} set is “local”.
> 3.) Something similar happens with mutations. If {{StorageProxy#mutate()}} 
> receives multiple mutations, we’ll mark against one of these new metrics in 
> {{ClientWriteRequestMetrics}} for each mutation, while 
> {{ClientWriteRequestMetrics}} will only register the actual client request 
> once.
> For cases 2 and 3, we may mark both local and remote requests for the same 
> overall client request, which introduces ambiguity if these are intended to 
> help audit driver coordinator selection behavior. There are a few options:
> a.) We can accept the ambiguity, but then we haven’t really accomplished the 
> goal of CASSANDRA-10023 for some request types.
> b.) We can simply not record any of these metrics for requests where multiple 
> partitions/tokens are involved.
> c.) We can be lenient, marking requests as “local” if any of the 
> partitions/tokens involved in the client request are, in fact, local.
> “c” feels like the option that preserves as much functionality as possible 
> without being ambiguous, but problem #2 above is still tricky, given the way 
> IN and GROUP BY queries behave w/ paging. (Perhaps ambiguity in that case is 
> acceptable?)
> In addition to the general ambiguity around the above…
> 4.) There is excessive object creation involved (on a hot path) in our 
> determination of whether a request is local or remote. We should be able to 
> mitigate this by getting rid of 
> {{AbstractReadExecutor#getContactedReplicas()}} and relying on 
> {{ReplicaPlan#lookup()}} rather than creating strings. (Even for writes, we 
> should be able to push down marking into performWrite(), where the write 
> ReplicaPlan is already available.)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to