[
https://issues.apache.org/jira/browse/CASSANDRA-17424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marcus Eriksson updated CASSANDRA-17424:
----------------------------------------
Reviewers: Jon Meredith, Jon Meredith, Marcus Eriksson (was: Jon Meredith,
Jon Meredith)
> Performance and Semantic Concerns w/ Metrics for Local vs. Remote Requests in
> StorageProxy
> ------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-17424
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17424
> Project: Cassandra
> Issue Type: Bug
> Components: Observability/Metrics
> Reporter: Caleb Rackliffe
> Assignee: Caleb Rackliffe
> Priority: Normal
> Fix For: 4.x
>
> Attachments:
> bugreport-blackjack-QODS30.163-7-27-2022-03-13-13-48-43.png
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> In CASSANDRA-10023, we added two new metrics to both {{ClientRequestMetrics}}
> and {{ClientWriteRequestMetrics}} to represent requests where the driver
> either does or does not make a correct token-aware choice of coordinator.
> (Auditing driver behavior is listed as the primary goal of that Jira.)
> There are, however, a few concerns we should address before this releases in
> 4.1:
> 1.) With paging enabled and a LIMIT < fetch size, {{IN}} queries can hit
> {{fetchRows()}} multiple times, so the number of local + remote requests
> isn’t the same as the number of queries marked in {{ClientRequestMetrics}} in
> {{readRegular()}}.
> 2.) {{IN}} queries will potentially mark a bunch of “remote” requests even if
> one key in the {{IN}} set is “local”.
> 3.) Something similar happens with mutations. If {{StorageProxy#mutate()}}
> receives multiple mutations, we’ll mark against one of these new metrics in
> {{ClientWriteRequestMetrics}} for each mutation, while
> {{ClientWriteRequestMetrics}} will only register the actual client request
> once.
> For cases 2 and 3, we may mark both local and remote requests for the same
> overall client request, which introduces ambiguity if these are intended to
> help audit driver coordinator selection behavior. There are a few options:
> a.) We can accept the ambiguity, but then we haven’t really accomplished the
> goal of CASSANDRA-10023 for some request types.
> b.) We can simply not record any of these metrics for requests where multiple
> partitions/tokens are involved.
> c.) We can be lenient, marking requests as “local” if any of the
> partitions/tokens involved in the client request are, in fact, local.
> “c” feels like the option that preserves as much functionality as possible
> without being ambiguous, but problem #2 above is still tricky, given the way
> IN and GROUP BY queries behave w/ paging. (Perhaps ambiguity in that case is
> acceptable?)
> In addition to the general ambiguity around the above…
> 4.) There is excessive object creation involved (on a hot path) in our
> determination of whether a request is local or remote. We should be able to
> mitigate this by getting rid of
> {{AbstractReadExecutor#getContactedReplicas()}} and relying on
> {{ReplicaPlan#lookup()}} rather than creating strings. (Even for writes, we
> should be able to push down marking into performWrite(), where the write
> ReplicaPlan is already available.)
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]