[
https://issues.apache.org/jira/browse/CASSANDRA-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266988#comment-14266988
]
Vishy Kasar commented on CASSANDRA-8194:
----------------------------------------
Sam,
CassandraAuthorizer executes the SELECT permissions query at
ConsistencyLevel.LOCAL_ONE. It is quite possible that the query lands up on a
node that is busy doing java-GC and times out. In our production use case, we
see these time outs routinely even though our replication factor for
system_auth is 10. The next user request will trigger reload with the query
hopefully landing on a less busy node and succeeding. Due to this, I prefer the
behavior in 8194-V2.patch where it continues to serve the stale entry till the
SELECT succeeds. Otherwise clients will be confused as to why they are getting
an AUTH related failure when no AUTH changes have occurred in the cluster.
Another option is to do a one time retry of SELECT against a different node and
throw if that fails as well. That will mostly eliminate the case of some busy
node causing auth failures.
Let me know what you think of this.
I agree with your point on dedicated executor here. I had not realized that
StorageService.tasks operates out of a single thread.
> Reading from Auth table should not be in the request path
> ---------------------------------------------------------
>
> Key: CASSANDRA-8194
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8194
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Vishy Kasar
> Assignee: Vishy Kasar
> Priority: Minor
> Fix For: 2.0.12, 3.0
>
> Attachments: 8194-V2.patch, 8194-V3.txt, 8194.patch, CacheTest2.java
>
>
> We use PasswordAuthenticator and PasswordAuthorizer. The system_auth has a RF
> of 10 per DC over 2 DCs. The permissions_validity_in_ms is 5 minutes.
> We still have few thousand requests failing each day with the trace below.
> The reason for this is read cache request realizing that cached entry has
> expired and doing a blocking request to refresh cache.
> We should have cache refreshed periodically only in the back ground. The user
> request should simply look at the cache and not try to refresh it.
> com.google.common.util.concurrent.UncheckedExecutionException:
> java.lang.RuntimeException:
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
> received only 0 responses.
> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2258)
> at com.google.common.cache.LocalCache.get(LocalCache.java:3990)
> at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3994)
> at
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4878)
> at
> org.apache.cassandra.service.ClientState.authorize(ClientState.java:292)
> at
> org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:172)
> at
> org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:165)
> at
> org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:149)
> at
> org.apache.cassandra.cql3.statements.ModificationStatement.checkAccess(ModificationStatement.java:75)
> at
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:102)
> at
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:113)
> at
> org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1735)
> at
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4162)
> at
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4150)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
> at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.RuntimeException:
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
> received only 0 responses.
> at org.apache.cassandra.auth.Auth.selectUser(Auth.java:256)
> at org.apache.cassandra.auth.Auth.isSuperuser(Auth.java:84)
> at
> org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:50)
> at
> org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:68)
> at org.apache.cassandra.service.ClientState$1.load(ClientState.java:278)
> at org.apache.cassandra.service.ClientState$1.load(ClientState.java:275)
> at
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3589)
> at
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2374)
> at
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2337)
> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2252)
> ... 19 more
> Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation
> timed out - received only 0 responses.
> at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:105)
> at
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:943)
> at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:828)
> at
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:140)
> at org.apache.cassandra.auth.Auth.selectUser(Auth.java:245)
> ... 28 more
> ERROR [Thrift:17232] 2014-10-24 05:06:51,004 CustomTThreadPoolServer.java
> (line 224) Error occurred during processing of message.
> com.google.common.util.concurrent.UncheckedExecutionException:
> java.lang.RuntimeException:
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
> received only 0 responses.
> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2258)
> at com.google.common.cache.LocalCache.get(LocalCache.java:3990)
> at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3994)
> at
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4878)
> at
> org.apache.cassandra.service.ClientState.authorize(ClientState.java:292)
> at
> org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:172)
> at
> org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:165)
> at
> org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:149)
> at
> org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:116)
> at
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:102)
> at
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:113)
> at
> org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1735)
> at
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4162)
> at
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4150)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
> at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.RuntimeException:
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
> received only 0 responses.
> at org.apache.cassandra.auth.Auth.selectUser(Auth.java:256)
> at org.apache.cassandra.auth.Auth.isSuperuser(Auth.java:84)
> at
> org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:50)
> at
> org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:68)
> at org.apache.cassandra.service.ClientState$1.load(ClientState.java:278)
> at org.apache.cassandra.service.ClientState$1.load(ClientState.java:275)
> at
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3589)
> at
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2374)
> at
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2337)
> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2252)
> ... 19 more
> Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation
> timed out - received only 0 responses.
> at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:105)
> at
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:943)
> at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:828)
> at
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:140)
> at org.apache.cassandra.auth.Auth.selectUser(Auth.java:245)
> ... 28 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)