[jira] [Commented] (CASSANDRA-14865) Cascading calls to read retries, system_auth, and read repairs

Sam Tunnicliffe (JIRA) Fri, 02 Nov 2018 11:29:12 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673519#comment-16673519
 ]


Sam Tunnicliffe commented on CASSANDRA-14865:
---------------------------------------------

{quote}We are discarding blocking read repairs possibility because we 
consistently get the same results when doing the query several times
{quote}
The read-repair messages here are slightly misleading. These are reported when 
a response is passed to the read callback after enough to satisfy the request's 
consistency level have already been received. In this case, the reads will be 
being done at LOCAL_ONE, so whenever the second response is received it goes 
down this path. This triggers a comparison between all the received responses, 
but they don't appear to be mismatching as that would also be recorded in the 
trace. So no actual repair is happening, at least in these traces.
{quote}I have seen AlwaysSpeculatingReadExecutor mentioned
{quote}
AlwaysSpeculatingExecutor is not involved here as if it were the trace would 
not include the "speculating read retry..." entries. Rather, the speculative 
retry policies for the roles & role_permissions tables are set to the 99th 
percentile (as noted on CASSANDRA-11340 this is set by default and cannot be 
altered). This is causing the SpeculatingReadExecutor to kick in and send an 
additional request.
{quote}We would expect calls to roles, within sequential read sessions, because 
the cache could be turned, but not so many calls within the same tracing 
session.
{quote}
The multiple reads in a single request are probably explained by two factors:
 * The roles cache in 3.x is pretty naive in that it only caches role 
membership info. The first step of the authorization process (with 
CassandraAuthorizer) is to check the superuser status, which unfortunately is 
not cached but has to be read from the roles table. So any authorization 
request that can't be satisfied from the permissions cache is going to trigger 
a read from system_auth.roles. This is fixed in trunk by CASSANDRA-14497.
 * Permissions and the resources they relate to are defined hierarchically, 
e.g. keyspace -> table. When performing authorization, the chain of resources 
is traversed from the bottom up until either the required permission is found, 
or the top level is reached. So if a role has permissions granted at the 
keyspace level (e.g. GRANT SELECT ON KEYSPACE ks TO bob), then a read against a 
ks.table1 will first check permissions granted directly on the table, then on 
the keyspace. The caching is aligned with the requested permissions, rather 
than what is directly granted. So in this example, the permissions would be 
cached for that role/table combination, not the role/keyspace, so this chain 
traversal only happens when a cache entry is first loaded.

I would advise you to first bump the validity period (the default of 2000ms is 
pretty low). Also, specifying a separate update_interval will improve the 
performance of the cache compared to the out of the box setup. When an entry is 
older than the validity period, it is expired from the cache, which forces a 
synchronous reload from disk the next time it is queried. The update_interval, 
sets a threshold for when a cache entry is eligible for an async refresh. While 
this refresh is happening, the previous value will be returned and so is 
transparent to callers. Setting an update_interval < validity allows unread 
entries to eventually get evicted, but doesn't penalise reads on infrequently 
accessed entries.

> Cascading calls to read retries, system_auth, and read repairs
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-14865
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14865
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Pedro Gordo
>            Priority: Major
>             Fix For: 3.11.1
>
>         Attachments: cn_dc.txt, nec_dc.txt, wec_dc.txt
>
>
> Roles validity and permission cache values are the default ones. Same thing 
> for the read-repair chance.
> We have a cluster with 3 data centers. We have noticed that in 2 of the data 
> centers (NEC and CN) we have multiple calls to speculative read retries 
> (rapid read protection), roles (instead of using cached values within the 
> same tracing session), and multiple read repair messages.
> We would expect calls to roles, within sequential read sessions, because the 
> cache could be turned, but not so many calls within the same tracing session. 
> Same thing for read-retries, and read repair messages. We are discarding 
> blocking read repairs possibility because we consistently get the same 
> results when doing the query several times.
> It feels like something is cascading calls to these mechanisms regardless of 
> conditions that would prevent them from being called (cached roles values for 
> instance).
> I have attached tracing files from the 3 data centers. Please let me know if 
> more info is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-14865) Cascading calls to read retries, system_auth, and read repairs

Reply via email to