Re: system_auth replication factor in Cassandra 2.1

2017-08-30 Thread Nate McCall
Regardless, if you are not modifying users frequently (with five you most
likely are not), make sure turn the permission cache wyyy up.

In 2.1 that is just: permissions_validity_in_ms (default is 2000 or 2
seconds). Feel free to set it to 1 day or some such. The corresponding
async update parameter (permissions_update_interval_in_ms) can be set to a
slightly smaller value. If you really need to, you can drop the cache via
the "invalidate" operation on the
"org.apache.cassandra.auth:type=PermissionsCache" mbean (on each node) to
revoke a user for example.

In later versions, you would have to do the same with:
- roles_validity_in_ms
- credentials_validity_in_ms
and their corresponding 'interval' parameters.


Re: system_auth replication factor in Cassandra 2.1

2017-08-30 Thread Erick Ramirez
Pool-Worker-1] | 2017-08-30
> 10:51:25.003000 | xx.xx.xx.116 |178
>
>
> Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30
> 10:51:25.003000 | xx.xx.xx.116 |186
>
>
>Read 1 live and 0 tombstone cells [SharedPool-Worker-1] |
> 2017-08-30 10:51:25.003000 | xx.xx.xx.116 |191
>
>
> Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30
> 10:51:25.003000 | xx.xx.xx.116 |194
>
>
> Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30
> 10:51:25.004000 | xx.xx.xx.116 |198
>
>
> Scanned 5 rows and matched 5 [SharedPool-Worker-1] | 2017-08-30
> 10:51:25.004000 | xx.xx.xx.116 |224
>
>
> Enqueuing response to /xx.xx.xx.113 [SharedPool-Worker-1] |
> 2017-08-30 10:51:25.004000 | xx.xx.xx.116 |240
>
>  Sending REQUEST_RESPONSE message to
> /xx.xx.xx.113 [MessagingService-Outgoing-/xx.xx.xx.113] | 2017-08-30
> 10:51:25.004000 | xx.xx.xx.116 |302
>
>
> Enqueuing request to /xx.xx.xx.116 [SharedPool-Worker-2] | 2017-08-30
> 10:51:25.014000 | xx.xx.xx.113 | 601103
>
>  Submitted 1 concurrent
> range requests covering 63681 ranges [SharedPool-Worker-2] | 2017-08-30
> 10:51:25.014000 | xx.xx.xx.113 | 601120
>
>   Sending PAGED_RANGE message to
> /xx.xx.xx.116 [MessagingService-Outgoing-/xx.xx.xx.116] | 2017-08-30
> 10:51:25.015000 | xx.xx.xx.113 | 601190
>
>   REQUEST_RESPONSE message received from
> /xx.xx.xx.116 [MessagingService-Incoming-/xx.xx.xx.116] | 2017-08-30
> 10:51:25.015000 | xx.xx.xx.113 |     601771
>
>
> Processing response from /xx.xx.xx.116 [SharedPool-Worker-1] | 2017-08-30
> 10:51:25.015000 | xx.xx.xx.113 | 601824
>
>
>  Request complete |
> 2017-08-30 10:51:25.014874 | xx.xx.xx.113 | 601874
>
>
>
>
>
> *From: *Oleksandr Shulgin <oleksandr.shul...@zalando.de>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Wednesday, August 30, 2017 at 10:42 AM
> *To: *User <user@cassandra.apache.org>
> *Subject: *Re: system_auth replication factor in Cassandra 2.1
>
>
>
> On Wed, Aug 30, 2017 at 6:40 PM, Chuck Reynolds <creyno...@ancestry.com>
> wrote:
>
> How many users do you have (or expect to be found in system_auth.users)?
>
>   5 users.
>
> What are the current RF for system_auth and consistency level you are
> using in cqlsh?
>
>  135 in one DC and 227 in the other DC.  Consistency level one
>
>
>
> Still very surprising...
>
>
>
> Did you try to obtain a trace of a timing-out query (with TRACING ON)?
>
> Tracing timeout even though I increased it to 120 seconds.
>
>
>
> Even if cqlsh doesn't print the trace because of timeout, you should be
> still able to find something in system_traces.
>
>
>
> --
>
> Alex
>
>
>


Re: system_auth replication factor in Cassandra 2.1

2017-08-30 Thread kurt greaves
For that many nodes mixed with vnodes you probably want a lower RF than N
per datacenter. 5 or 7 would be reasonable. The only down side is that auth
queries may take slightly longer as they will often have to go to other
nodes to be resolved, but in practice this is likely not a big deal as the
data will be cached anyway.


Re: system_auth replication factor in Cassandra 2.1

2017-08-30 Thread Chuck Reynolds
 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30 10:51:25.003000 
| xx.xx.xx.116 |178
   Read 
1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30 10:51:25.003000 
| xx.xx.xx.116 |186
   Read 
1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30 10:51:25.003000 
| xx.xx.xx.116 |191
   Read 
1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30 10:51:25.003000 
| xx.xx.xx.116 |194
   Read 
1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-08-30 10:51:25.004000 
| xx.xx.xx.116 |198

Scanned 5 rows and matched 5 [SharedPool-Worker-1] | 2017-08-30 10:51:25.004000 
| xx.xx.xx.116 |224

Enqueuing response to /xx.xx.xx.113 [SharedPool-Worker-1] | 2017-08-30 
10:51:25.004000 | xx.xx.xx.116 |240
 Sending REQUEST_RESPONSE message to 
/xx.xx.xx.113 [MessagingService-Outgoing-/xx.xx.xx.113] | 2017-08-30 
10:51:25.004000 | xx.xx.xx.116 |302
 
Enqueuing request to /xx.xx.xx.116 [SharedPool-Worker-2] | 2017-08-30 
10:51:25.014000 | xx.xx.xx.113 | 601103
 Submitted 1 concurrent range 
requests covering 63681 ranges [SharedPool-Worker-2] | 2017-08-30 
10:51:25.014000 | xx.xx.xx.113 | 601120
  Sending PAGED_RANGE message to 
/xx.xx.xx.116 [MessagingService-Outgoing-/xx.xx.xx.116] | 2017-08-30 
10:51:25.015000 | xx.xx.xx.113 | 601190
  REQUEST_RESPONSE message received from 
/xx.xx.xx.116 [MessagingService-Incoming-/xx.xx.xx.116] | 2017-08-30 
10:51:25.015000 | xx.xx.xx.113 | 601771
 Processing 
response from /xx.xx.xx.116 [SharedPool-Worker-1] | 2017-08-30 10:51:25.015000 
| xx.xx.xx.113 | 601824

  Request complete | 2017-08-30 10:51:25.014874 
| xx.xx.xx.113 | 601874


From: Oleksandr Shulgin <oleksandr.shul...@zalando.de>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, August 30, 2017 at 10:42 AM
To: User <user@cassandra.apache.org>
Subject: Re: system_auth replication factor in Cassandra 2.1

On Wed, Aug 30, 2017 at 6:40 PM, Chuck Reynolds 
<creyno...@ancestry.com<mailto:creyno...@ancestry.com>> wrote:
How many users do you have (or expect to be found in system_auth.users)?
  5 users.
What are the current RF for system_auth and consistency level you are using in 
cqlsh?
 135 in one DC and 227 in the other DC.  Consistency level one

Still very surprising...

Did you try to obtain a trace of a timing-out query (with TRACING ON)?
Tracing timeout even though I increased it to 120 seconds.

Even if cqlsh doesn't print the trace because of timeout, you should be still 
able to find something in system_traces.

--
Alex



Re: system_auth replication factor in Cassandra 2.1

2017-08-30 Thread Oleksandr Shulgin
On Wed, Aug 30, 2017 at 6:40 PM, Chuck Reynolds 
wrote:

> How many users do you have (or expect to be found in system_auth.users)?
>
>   5 users.
>
> What are the current RF for system_auth and consistency level you are
> using in cqlsh?
>
>  135 in one DC and 227 in the other DC.  Consistency level one
>

Still very surprising...

Did you try to obtain a trace of a timing-out query (with TRACING ON)?
>
> Tracing timeout even though I increased it to 120 seconds.
>

Even if cqlsh doesn't print the trace because of timeout, you should be
still able to find something in system_traces.

--
Alex


Re: system_auth replication factor in Cassandra 2.1

2017-08-30 Thread Chuck Reynolds
How many users do you have (or expect to be found in system_auth.users)?
  5 users.
What are the current RF for system_auth and consistency level you are using in 
cqlsh?
 135 in one DC and 227 in the other DC.  Consistency level one
Did you try to obtain a trace of a timing-out query (with TRACING ON)?
Tracing timeout even though I increased it to 120 seconds.

From: Oleksandr Shulgin <oleksandr.shul...@zalando.de>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, August 30, 2017 at 10:19 AM
To: User <user@cassandra.apache.org>
Subject: Re: system_auth replication factor in Cassandra 2.1

On Wed, Aug 30, 2017 at 5:50 PM, Chuck Reynolds 
<creyno...@ancestry.com<mailto:creyno...@ancestry.com>> wrote:
So I’ve read that if your using authentication in Cassandra 2.1 that your 
replication factor should match the number of nodes in your datacenter.

Is that true?

I have two datacenter cluster, 135 nodes in datacenter 1 & 227 nodes in an AWS 
datacenter.

Why do I want to replicate the system_auth table that many times?

What are the benefits and disadvantages of matching the number of nodes as 
opposed to the standard replication factor of 3?


The reason I’m asking the question is because it seems like I’m getting a lot 
of authentication errors now and they seem to happen more under load.

Also, querying the system_auth table from cqlsh to get the users seems to now 
timeout.

This is surprising.

How many users do you have (or expect to be found in system_auth.users)?   What 
are the current RF for system_auth and consistency level you are using in 
cqlsh?  Did you try to obtain a trace of a timing-out query (with TRACING ON)?

Regards,
--
Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176 
127-59-707



Re: system_auth replication factor in Cassandra 2.1

2017-08-30 Thread Oleksandr Shulgin
On Wed, Aug 30, 2017 at 6:20 PM, Chuck Reynolds 
wrote:

> So I tried to run a repair with the following on one of the server.
>
> nodetool repair system_auth -pr –local
>
>
>
> After two hours it hadn’t finished.  I had to kill the repair because of
> another issue and haven’t tried again.
>
>
>
> *Why would such a small table take so long to repair?*
>

It could be the overhead of that many nodes having to communicate with each
other (times the number of vnodes).  Even on a small clusters (3-5 nodes) I
think it takes a few minutes to run a repair on a small/empty keyspace.

*Also what would happen if I set the RF back to a lower number like 5?*
>

You should still run a repair afterwards, but I would expect it to finish
in a reasonable time.

--
Alex


Re: system_auth replication factor in Cassandra 2.1

2017-08-30 Thread Chuck Reynolds
So I tried to run a repair with the following on one of the server.
nodetool repair system_auth -pr –local

After two hours it hadn’t finished.  I had to kill the repair because of 
another issue and haven’t tried again.

Why would such a small table take so long to repair?

Also what would happen if I set the RF back to a lower number like 5?


Thanks
From: <li...@beobal.com> on behalf of Sam Tunnicliffe <s...@beobal.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, August 30, 2017 at 10:10 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: system_auth replication factor in Cassandra 2.1

It's a better rule of thumb to use an RF of 3 to 5 per DC and this is what the 
docs now suggest: 
http://cassandra.apache.org/doc/latest/operating/security.html#authentication
Out of the box, the system_auth keyspace is setup with SimpleStrategy and RF=1 
so that it works on any new system including dev & test clusters, but obviously 
that's no use for a production system.

Regarding the increased rate of authentication errors: did you run repair after 
changing the RF? Auth queries are done at CL.LOCAL_ONE, so if you haven't 
repaired, the data for the user logging in will probably not be where it should 
be. The exception to this is the default "cassandra" user, queries for that 
user are done at CL.QUORUM, which will indeed lead to timeouts and 
authentication errors with a very high RF. It's recommended to only use that 
default user to bootstrap the setup of your own users & superusers, the link 
above also has info on this.

Thanks,
Sam


On 30 August 2017 at 16:50, Chuck Reynolds 
<creyno...@ancestry.com<mailto:creyno...@ancestry.com>> wrote:
So I’ve read that if your using authentication in Cassandra 2.1 that your 
replication factor should match the number of nodes in your datacenter.

Is that true?

I have two datacenter cluster, 135 nodes in datacenter 1 & 227 nodes in an AWS 
datacenter.

Why do I want to replicate the system_auth table that many times?

What are the benefits and disadvantages of matching the number of nodes as 
opposed to the standard replication factor of 3?


The reason I’m asking the question is because it seems like I’m getting a lot 
of authentication errors now and they seem to happen more under load.

Also, querying the system_auth table from cqlsh to get the users seems to now 
timeout.


Any help would be greatly appreciated.

Thanks



Re: system_auth replication factor in Cassandra 2.1

2017-08-30 Thread Oleksandr Shulgin
On Wed, Aug 30, 2017 at 5:50 PM, Chuck Reynolds 
wrote:

> So I’ve read that if your using authentication in Cassandra 2.1 that your
> replication factor should match the number of nodes in your datacenter.
>
>
>
> *Is that true?*
>
>
>
> I have two datacenter cluster, 135 nodes in datacenter 1 & 227 nodes in an
> AWS datacenter.
>
>
>
> *Why do I want to replicate the system_auth table that many times?*
>
>
>
> *What are the benefits and disadvantages of matching the number of nodes
> as opposed to the standard replication factor of 3? *
>
>
>
>
>
> The reason I’m asking the question is because it seems like I’m getting a
> lot of authentication errors now and they seem to happen more under load.
>
>
>
> Also, querying the system_auth table from cqlsh to get the users seems to
> now timeout.
>

This is surprising.

How many users do you have (or expect to be found in system_auth.users)?
What are the current RF for system_auth and consistency level you are using
in cqlsh?  Did you try to obtain a trace of a timing-out query (with
TRACING ON)?

Regards,
-- 
Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176
127-59-707


Re: system_auth replication factor in Cassandra 2.1

2017-08-30 Thread Sam Tunnicliffe
It's a better rule of thumb to use an RF of 3 to 5 per DC and this is what
the docs now suggest:
http://cassandra.apache.org/doc/latest/operating/security.html#authentication

Out of the box, the system_auth keyspace is setup with SimpleStrategy and
RF=1 so that it works on any new system including dev & test clusters, but
obviously that's no use for a production system.

Regarding the increased rate of authentication errors: did you run repair
after changing the RF? Auth queries are done at CL.LOCAL_ONE, so if you
haven't repaired, the data for the user logging in will probably not be
where it should be. The exception to this is the default "cassandra" user,
queries for that user are done at CL.QUORUM, which will indeed lead to
timeouts and authentication errors with a very high RF. It's recommended to
only use that default user to bootstrap the setup of your own users &
superusers, the link above also has info on this.

Thanks,
Sam


On 30 August 2017 at 16:50, Chuck Reynolds  wrote:

> So I’ve read that if your using authentication in Cassandra 2.1 that your
> replication factor should match the number of nodes in your datacenter.
>
>
>
> *Is that true?*
>
>
>
> I have two datacenter cluster, 135 nodes in datacenter 1 & 227 nodes in an
> AWS datacenter.
>
>
>
> *Why do I want to replicate the system_auth table that many times?*
>
>
>
> *What are the benefits and disadvantages of matching the number of nodes
> as opposed to the standard replication factor of 3? *
>
>
>
>
>
> The reason I’m asking the question is because it seems like I’m getting a
> lot of authentication errors now and they seem to happen more under load.
>
>
>
> Also, querying the system_auth table from cqlsh to get the users seems to
> now timeout.
>
>
>
>
>
> Any help would be greatly appreciated.
>
>
>
> Thanks
>


RE: system_auth replication factor in Cassandra 2.1

2017-08-30 Thread Jonathan Baynes

I recently came across an issue where by my user Keyspace was replicated by 3 
(I have 3 nodes) but my system_auth was default to 1, we also use 
authentication, I then lost 2 of my nodes and because authentication wasn’t 
replicated I couldn’t log in.

Once I resolved the issue, and got the nodes back up, I could then log back in, 
I too asked the community what was going on , and I was pointed to this

http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/sec/secConfSysAuthKeyspRepl.html

it clearly states the following

Attention: To prevent a potential problem logging into a secure cluster, set 
the replication factor of the system_auth and dse_security keyspaces to a value 
that is greater than 1. In a multi-node cluster, using the default of 1 
prevents logging into any node when the node that stores the user data is down.



From: Chuck Reynolds [mailto:creyno...@ancestry.com]
Sent: 30 August 2017 16:51
To: user@cassandra.apache.org
Subject: system_auth replication factor in Cassandra 2.1

So I’ve read that if your using authentication in Cassandra 2.1 that your 
replication factor should match the number of nodes in your datacenter.

Is that true?

I have two datacenter cluster, 135 nodes in datacenter 1 & 227 nodes in an AWS 
datacenter.

Why do I want to replicate the system_auth table that many times?

What are the benefits and disadvantages of matching the number of nodes as 
opposed to the standard replication factor of 3?


The reason I’m asking the question is because it seems like I’m getting a lot 
of authentication errors now and they seem to happen more under load.

Also, querying the system_auth table from cqlsh to get the users seems to now 
timeout.


Any help would be greatly appreciated.

Thanks



This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and destroy it. Any unauthorized copying, 
disclosure or distribution of the material in this e-mail is strictly 
forbidden. Tradeweb reserves the right to monitor all e-mail communications 
through its networks. If you do not wish to receive marketing emails about our 
products / services, please let us know by contacting us, either by email at 
contac...@tradeweb.com or by writing to us at the registered office of Tradeweb 
in the UK, which is: Tradeweb Europe Limited (company number 3912826), 1 Fore 
Street Avenue London EC2Y 9DT. To see our privacy policy, visit our website @ 
www.tradeweb.com.