Re: system_auth replication factor in Cassandra 2.1
Regardless, if you are not modifying users frequently (with five you most likely are not), make sure turn the permission cache wyyy up. In 2.1 that is just: permissions_validity_in_ms (default is 2000 or 2 seconds). Feel free to set it to 1 day or some such. The corresponding async update parameter (permissions_update_interval_in_ms) can be set to a slightly smaller value. If you really need to, you can drop the cache via the "invalidate" operation on the "org.apache.cassandra.auth:type=PermissionsCache" mbean (on each node) to revoke a user for example. In later versions, you would have to do the same with: - roles_validity_in_ms - credentials_validity_in_ms and their corresponding 'interval' parameters.
Re: system_auth replication factor in Cassandra 2.1
Pool-Worker-1] | 2017-08-30 > 10:51:25.003000 | xx.xx.xx.116 |178 > > > Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30 > 10:51:25.003000 | xx.xx.xx.116 |186 > > >Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | > 2017-08-30 10:51:25.003000 | xx.xx.xx.116 |191 > > > Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30 > 10:51:25.003000 | xx.xx.xx.116 |194 > > > Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30 > 10:51:25.004000 | xx.xx.xx.116 |198 > > > Scanned 5 rows and matched 5 [SharedPool-Worker-1] | 2017-08-30 > 10:51:25.004000 | xx.xx.xx.116 |224 > > > Enqueuing response to /xx.xx.xx.113 [SharedPool-Worker-1] | > 2017-08-30 10:51:25.004000 | xx.xx.xx.116 |240 > > Sending REQUEST_RESPONSE message to > /xx.xx.xx.113 [MessagingService-Outgoing-/xx.xx.xx.113] | 2017-08-30 > 10:51:25.004000 | xx.xx.xx.116 |302 > > > Enqueuing request to /xx.xx.xx.116 [SharedPool-Worker-2] | 2017-08-30 > 10:51:25.014000 | xx.xx.xx.113 | 601103 > > Submitted 1 concurrent > range requests covering 63681 ranges [SharedPool-Worker-2] | 2017-08-30 > 10:51:25.014000 | xx.xx.xx.113 | 601120 > > Sending PAGED_RANGE message to > /xx.xx.xx.116 [MessagingService-Outgoing-/xx.xx.xx.116] | 2017-08-30 > 10:51:25.015000 | xx.xx.xx.113 | 601190 > > REQUEST_RESPONSE message received from > /xx.xx.xx.116 [MessagingService-Incoming-/xx.xx.xx.116] | 2017-08-30 > 10:51:25.015000 | xx.xx.xx.113 | 601771 > > > Processing response from /xx.xx.xx.116 [SharedPool-Worker-1] | 2017-08-30 > 10:51:25.015000 | xx.xx.xx.113 | 601824 > > > Request complete | > 2017-08-30 10:51:25.014874 | xx.xx.xx.113 | 601874 > > > > > > *From: *Oleksandr Shulgin <oleksandr.shul...@zalando.de> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Wednesday, August 30, 2017 at 10:42 AM > *To: *User <user@cassandra.apache.org> > *Subject: *Re: system_auth replication factor in Cassandra 2.1 > > > > On Wed, Aug 30, 2017 at 6:40 PM, Chuck Reynolds <creyno...@ancestry.com> > wrote: > > How many users do you have (or expect to be found in system_auth.users)? > > 5 users. > > What are the current RF for system_auth and consistency level you are > using in cqlsh? > > 135 in one DC and 227 in the other DC. Consistency level one > > > > Still very surprising... > > > > Did you try to obtain a trace of a timing-out query (with TRACING ON)? > > Tracing timeout even though I increased it to 120 seconds. > > > > Even if cqlsh doesn't print the trace because of timeout, you should be > still able to find something in system_traces. > > > > -- > > Alex > > >
Re: system_auth replication factor in Cassandra 2.1
For that many nodes mixed with vnodes you probably want a lower RF than N per datacenter. 5 or 7 would be reasonable. The only down side is that auth queries may take slightly longer as they will often have to go to other nodes to be resolved, but in practice this is likely not a big deal as the data will be cached anyway.
Re: system_auth replication factor in Cassandra 2.1
live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30 10:51:25.003000 | xx.xx.xx.116 |178 Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30 10:51:25.003000 | xx.xx.xx.116 |186 Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30 10:51:25.003000 | xx.xx.xx.116 |191 Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30 10:51:25.003000 | xx.xx.xx.116 |194 Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30 10:51:25.004000 | xx.xx.xx.116 |198 Scanned 5 rows and matched 5 [SharedPool-Worker-1] | 2017-08-30 10:51:25.004000 | xx.xx.xx.116 |224 Enqueuing response to /xx.xx.xx.113 [SharedPool-Worker-1] | 2017-08-30 10:51:25.004000 | xx.xx.xx.116 |240 Sending REQUEST_RESPONSE message to /xx.xx.xx.113 [MessagingService-Outgoing-/xx.xx.xx.113] | 2017-08-30 10:51:25.004000 | xx.xx.xx.116 |302 Enqueuing request to /xx.xx.xx.116 [SharedPool-Worker-2] | 2017-08-30 10:51:25.014000 | xx.xx.xx.113 | 601103 Submitted 1 concurrent range requests covering 63681 ranges [SharedPool-Worker-2] | 2017-08-30 10:51:25.014000 | xx.xx.xx.113 | 601120 Sending PAGED_RANGE message to /xx.xx.xx.116 [MessagingService-Outgoing-/xx.xx.xx.116] | 2017-08-30 10:51:25.015000 | xx.xx.xx.113 | 601190 REQUEST_RESPONSE message received from /xx.xx.xx.116 [MessagingService-Incoming-/xx.xx.xx.116] | 2017-08-30 10:51:25.015000 | xx.xx.xx.113 | 601771 Processing response from /xx.xx.xx.116 [SharedPool-Worker-1] | 2017-08-30 10:51:25.015000 | xx.xx.xx.113 | 601824 Request complete | 2017-08-30 10:51:25.014874 | xx.xx.xx.113 | 601874 From: Oleksandr Shulgin <oleksandr.shul...@zalando.de> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Wednesday, August 30, 2017 at 10:42 AM To: User <user@cassandra.apache.org> Subject: Re: system_auth replication factor in Cassandra 2.1 On Wed, Aug 30, 2017 at 6:40 PM, Chuck Reynolds <creyno...@ancestry.com<mailto:creyno...@ancestry.com>> wrote: How many users do you have (or expect to be found in system_auth.users)? 5 users. What are the current RF for system_auth and consistency level you are using in cqlsh? 135 in one DC and 227 in the other DC. Consistency level one Still very surprising... Did you try to obtain a trace of a timing-out query (with TRACING ON)? Tracing timeout even though I increased it to 120 seconds. Even if cqlsh doesn't print the trace because of timeout, you should be still able to find something in system_traces. -- Alex
Re: system_auth replication factor in Cassandra 2.1
On Wed, Aug 30, 2017 at 6:40 PM, Chuck Reynoldswrote: > How many users do you have (or expect to be found in system_auth.users)? > > 5 users. > > What are the current RF for system_auth and consistency level you are > using in cqlsh? > > 135 in one DC and 227 in the other DC. Consistency level one > Still very surprising... Did you try to obtain a trace of a timing-out query (with TRACING ON)? > > Tracing timeout even though I increased it to 120 seconds. > Even if cqlsh doesn't print the trace because of timeout, you should be still able to find something in system_traces. -- Alex
Re: system_auth replication factor in Cassandra 2.1
How many users do you have (or expect to be found in system_auth.users)? 5 users. What are the current RF for system_auth and consistency level you are using in cqlsh? 135 in one DC and 227 in the other DC. Consistency level one Did you try to obtain a trace of a timing-out query (with TRACING ON)? Tracing timeout even though I increased it to 120 seconds. From: Oleksandr Shulgin <oleksandr.shul...@zalando.de> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Wednesday, August 30, 2017 at 10:19 AM To: User <user@cassandra.apache.org> Subject: Re: system_auth replication factor in Cassandra 2.1 On Wed, Aug 30, 2017 at 5:50 PM, Chuck Reynolds <creyno...@ancestry.com<mailto:creyno...@ancestry.com>> wrote: So I’ve read that if your using authentication in Cassandra 2.1 that your replication factor should match the number of nodes in your datacenter. Is that true? I have two datacenter cluster, 135 nodes in datacenter 1 & 227 nodes in an AWS datacenter. Why do I want to replicate the system_auth table that many times? What are the benefits and disadvantages of matching the number of nodes as opposed to the standard replication factor of 3? The reason I’m asking the question is because it seems like I’m getting a lot of authentication errors now and they seem to happen more under load. Also, querying the system_auth table from cqlsh to get the users seems to now timeout. This is surprising. How many users do you have (or expect to be found in system_auth.users)? What are the current RF for system_auth and consistency level you are using in cqlsh? Did you try to obtain a trace of a timing-out query (with TRACING ON)? Regards, -- Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176 127-59-707
Re: system_auth replication factor in Cassandra 2.1
On Wed, Aug 30, 2017 at 6:20 PM, Chuck Reynoldswrote: > So I tried to run a repair with the following on one of the server. > > nodetool repair system_auth -pr –local > > > > After two hours it hadn’t finished. I had to kill the repair because of > another issue and haven’t tried again. > > > > *Why would such a small table take so long to repair?* > It could be the overhead of that many nodes having to communicate with each other (times the number of vnodes). Even on a small clusters (3-5 nodes) I think it takes a few minutes to run a repair on a small/empty keyspace. *Also what would happen if I set the RF back to a lower number like 5?* > You should still run a repair afterwards, but I would expect it to finish in a reasonable time. -- Alex
Re: system_auth replication factor in Cassandra 2.1
So I tried to run a repair with the following on one of the server. nodetool repair system_auth -pr –local After two hours it hadn’t finished. I had to kill the repair because of another issue and haven’t tried again. Why would such a small table take so long to repair? Also what would happen if I set the RF back to a lower number like 5? Thanks From: <li...@beobal.com> on behalf of Sam Tunnicliffe <s...@beobal.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Wednesday, August 30, 2017 at 10:10 AM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: system_auth replication factor in Cassandra 2.1 It's a better rule of thumb to use an RF of 3 to 5 per DC and this is what the docs now suggest: http://cassandra.apache.org/doc/latest/operating/security.html#authentication Out of the box, the system_auth keyspace is setup with SimpleStrategy and RF=1 so that it works on any new system including dev & test clusters, but obviously that's no use for a production system. Regarding the increased rate of authentication errors: did you run repair after changing the RF? Auth queries are done at CL.LOCAL_ONE, so if you haven't repaired, the data for the user logging in will probably not be where it should be. The exception to this is the default "cassandra" user, queries for that user are done at CL.QUORUM, which will indeed lead to timeouts and authentication errors with a very high RF. It's recommended to only use that default user to bootstrap the setup of your own users & superusers, the link above also has info on this. Thanks, Sam On 30 August 2017 at 16:50, Chuck Reynolds <creyno...@ancestry.com<mailto:creyno...@ancestry.com>> wrote: So I’ve read that if your using authentication in Cassandra 2.1 that your replication factor should match the number of nodes in your datacenter. Is that true? I have two datacenter cluster, 135 nodes in datacenter 1 & 227 nodes in an AWS datacenter. Why do I want to replicate the system_auth table that many times? What are the benefits and disadvantages of matching the number of nodes as opposed to the standard replication factor of 3? The reason I’m asking the question is because it seems like I’m getting a lot of authentication errors now and they seem to happen more under load. Also, querying the system_auth table from cqlsh to get the users seems to now timeout. Any help would be greatly appreciated. Thanks
Re: system_auth replication factor in Cassandra 2.1
On Wed, Aug 30, 2017 at 5:50 PM, Chuck Reynoldswrote: > So I’ve read that if your using authentication in Cassandra 2.1 that your > replication factor should match the number of nodes in your datacenter. > > > > *Is that true?* > > > > I have two datacenter cluster, 135 nodes in datacenter 1 & 227 nodes in an > AWS datacenter. > > > > *Why do I want to replicate the system_auth table that many times?* > > > > *What are the benefits and disadvantages of matching the number of nodes > as opposed to the standard replication factor of 3? * > > > > > > The reason I’m asking the question is because it seems like I’m getting a > lot of authentication errors now and they seem to happen more under load. > > > > Also, querying the system_auth table from cqlsh to get the users seems to > now timeout. > This is surprising. How many users do you have (or expect to be found in system_auth.users)? What are the current RF for system_auth and consistency level you are using in cqlsh? Did you try to obtain a trace of a timing-out query (with TRACING ON)? Regards, -- Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176 127-59-707
Re: system_auth replication factor in Cassandra 2.1
It's a better rule of thumb to use an RF of 3 to 5 per DC and this is what the docs now suggest: http://cassandra.apache.org/doc/latest/operating/security.html#authentication Out of the box, the system_auth keyspace is setup with SimpleStrategy and RF=1 so that it works on any new system including dev & test clusters, but obviously that's no use for a production system. Regarding the increased rate of authentication errors: did you run repair after changing the RF? Auth queries are done at CL.LOCAL_ONE, so if you haven't repaired, the data for the user logging in will probably not be where it should be. The exception to this is the default "cassandra" user, queries for that user are done at CL.QUORUM, which will indeed lead to timeouts and authentication errors with a very high RF. It's recommended to only use that default user to bootstrap the setup of your own users & superusers, the link above also has info on this. Thanks, Sam On 30 August 2017 at 16:50, Chuck Reynoldswrote: > So I’ve read that if your using authentication in Cassandra 2.1 that your > replication factor should match the number of nodes in your datacenter. > > > > *Is that true?* > > > > I have two datacenter cluster, 135 nodes in datacenter 1 & 227 nodes in an > AWS datacenter. > > > > *Why do I want to replicate the system_auth table that many times?* > > > > *What are the benefits and disadvantages of matching the number of nodes > as opposed to the standard replication factor of 3? * > > > > > > The reason I’m asking the question is because it seems like I’m getting a > lot of authentication errors now and they seem to happen more under load. > > > > Also, querying the system_auth table from cqlsh to get the users seems to > now timeout. > > > > > > Any help would be greatly appreciated. > > > > Thanks >
RE: system_auth replication factor in Cassandra 2.1
I recently came across an issue where by my user Keyspace was replicated by 3 (I have 3 nodes) but my system_auth was default to 1, we also use authentication, I then lost 2 of my nodes and because authentication wasn’t replicated I couldn’t log in. Once I resolved the issue, and got the nodes back up, I could then log back in, I too asked the community what was going on , and I was pointed to this http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/sec/secConfSysAuthKeyspRepl.html it clearly states the following Attention: To prevent a potential problem logging into a secure cluster, set the replication factor of the system_auth and dse_security keyspaces to a value that is greater than 1. In a multi-node cluster, using the default of 1 prevents logging into any node when the node that stores the user data is down. From: Chuck Reynolds [mailto:creyno...@ancestry.com] Sent: 30 August 2017 16:51 To: user@cassandra.apache.org Subject: system_auth replication factor in Cassandra 2.1 So I’ve read that if your using authentication in Cassandra 2.1 that your replication factor should match the number of nodes in your datacenter. Is that true? I have two datacenter cluster, 135 nodes in datacenter 1 & 227 nodes in an AWS datacenter. Why do I want to replicate the system_auth table that many times? What are the benefits and disadvantages of matching the number of nodes as opposed to the standard replication factor of 3? The reason I’m asking the question is because it seems like I’m getting a lot of authentication errors now and they seem to happen more under load. Also, querying the system_auth table from cqlsh to get the users seems to now timeout. Any help would be greatly appreciated. Thanks This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy it. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. Tradeweb reserves the right to monitor all e-mail communications through its networks. If you do not wish to receive marketing emails about our products / services, please let us know by contacting us, either by email at contac...@tradeweb.com or by writing to us at the registered office of Tradeweb in the UK, which is: Tradeweb Europe Limited (company number 3912826), 1 Fore Street Avenue London EC2Y 9DT. To see our privacy policy, visit our website @ www.tradeweb.com.