Re: Cassandra 3.11 is compacting forever

2017-09-06 Thread Igor Leão
Last week I moved all nodes back to cassandra 3.9. Everything worked fine
since then.
Yesterday I tried to upgrade again, running a rolling restart after the
upgrade.Nodes were just fine. Today one node started consuming 94.6% of its
CPU. Compacting is running all the time for this node.
I'm afraid to have the remaining nodes increasing their CPUs over the next
couple of days, as happened last week.

2017-09-03 22:28 GMT-03:00 kurt greaves :

> Can't say that message explains why the compaction would be stuck.
> Generally not a good sign and you might need to investigate more but
> hopefully shouldn't be related. Has that stuck compaction moved since last
> week?
>
>
> On 1 September 2017 at 22:54, Fay Hou [Storage Service] ­ <
> fay...@coupang.com> wrote:
>
>> try to do a rolling restart for the cluster before doing a compation
>>
>> On Fri, Sep 1, 2017 at 3:09 PM, Igor Leão  wrote:
>>
>>> Some generic errors:
>>>
>>> *[aladdin@ip-172-16-1-10 cassandra]$ tail cassandra.log | grep -i error*
>>> *[aladdin@ip-172-16-1-10 cassandra]$ tail cassandra.log | grep -i excep*
>>> *[aladdin@ip-172-16-1-10 cassandra]$ tail cassandra.log | grep -i fail*
>>> *[aladdin@ip-172-16-1-10 cassandra]$ tail debug.log | grep -i error*
>>> *[aladdin@ip-172-16-1-10 cassandra]$ tail debug.log | grep -i exce*
>>> *[aladdin@ip-172-16-1-10 cassandra]$ tail debug.log | grep -i fail*
>>> *DEBUG [GossipStage:1] 2017-09-01 15:33:27,046 FailureDetector.java:457
>>> - Ignoring interval time of 2108299431 <(210)%20829-9431> for /172.16.1.112
>>> *
>>> *DEBUG [GossipStage:1] 2017-09-01 15:33:29,051 FailureDetector.java:457
>>> - Ignoring interval time of 2005507384 for /172.16.1.74
>>> *
>>> *DEBUG [GossipStage:1] 2017-09-01 15:33:45,968 FailureDetector.java:457
>>> - Ignoring interval time of 2003371497 for /172.16.1.74
>>> *
>>> *DEBUG [GossipStage:1] 2017-09-01 15:33:51,133 FailureDetector.java:457
>>> - Ignoring interval time of 2013260173 <(201)%20326-0173> for /172.16.1.74
>>> *
>>> *DEBUG [GossipStage:1] 2017-09-01 15:33:58,981 FailureDetector.java:457
>>> - Ignoring interval time of 2009620081 for /172.16.1.112
>>> *
>>> *DEBUG [GossipStage:1] 2017-09-01 15:34:19,235 FailureDetector.java:457
>>> - Ignoring interval time of 2010956256 for /172.16.1.74
>>> *
>>> *DEBUG [GossipStage:1] 2017-09-01 15:34:19,235 FailureDetector.java:457
>>> - Ignoring interval time of 2011127930 for /10.0.1.122 *
>>> *[aladdin@ip-172-16-1-10 cassandra]$ tail system.log | grep -i error*
>>> *io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
>>> failed: Connection reset by peer*
>>> *[aladdin@ip-172-16-1-10 cassandra]$ tail system.log | grep -i exce*
>>> *INFO  [Native-Transport-Requests-5] 2017-09-01 15:22:58,806
>>> Message.java:619 - Unexpected exception during request; channel = [id:
>>> 0xdd63db2f, L:/10.0.1.47:9042  !
>>> R:/10.0.44.196:41422 ]*
>>> *io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
>>> failed: Connection reset by peer*
>>> *[aladdin@ip-172-16-1-10 cassandra]$ tail system.log | grep -i fail*
>>> *io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
>>> failed: Connection reset by peer*
>>>
>>>
>>> Some interesting errors:
>>>
>>> 1.
>>> *DEBUG [ReadRepairStage:1] 2017-09-01 15:34:58,485 ReadCallback.java:242
>>> - Digest mismatch:*
>>> *org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>>> DecoratedKey(5988282114260523734,
>>> 32623331326162652d63352d343237632d626334322d306466643762653836343830)
>>> (023d99bbcf2263f0fa450c2312fdce88 vs a60ba37a46e0a61227a8b560fa4e0dfb)*
>>> * at
>>> org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]*
>>> * at
>>> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]*
>>> * at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> [na:1.8.0_112]*
>>> * at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> [na:1.8.0_112]*
>>> * at
>>> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
>>> [apache-cassandra-3.11.0.jar:3.11.0]*
>>> * at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_112]*
>>>
>>> 2.
>>> *INFO  [Native-Transport-Requests-5] 2017-09-01 15:22:58,806
>>> Message.java:619 - Unexpected exception during request; channel = [id:
>>> 0xdd63db2f, L:/10.0.1.47:9042  !
>>> R:/10.0.44.196:41422 ]*
>>> *io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
>>> failed: Connection reset by peer*
>>> * at 

Re: C* 3 node issue -Urgent

2017-09-06 Thread Ben Bromhead
Just to clarify that behaviour. QUORUM only applies to the default
superuser, subsequent superusers you create later on are still only queried
at LOCAL_ONE. E.g.

protected static ConsistencyLevel consistencyForRole(String role)
{
if (role.equals(DEFAULT_SUPERUSER_NAME))
return ConsistencyLevel.QUORUM;
else
return ConsistencyLevel.LOCAL_ONE;
}


Despite the fact it suggests consistency for a given role... the function
actually gets passed the username not the role (role lookup happens after
authentication iirc).

Best practices would suggest you change the default superuser password to
some long random password and throw it away and use other superuser
accounts. The Cassandra user is only their to bootstrap auth and nothing
else.

If your RF for the system auth table is very high it will not make it
difficult to login, just to change your password :)




On Wed, 6 Sep 2017 at 11:43 Jeff Jirsa  wrote:

> More explicitly - if you have 60 nodes, setting rf=60 will likely make it
> very difficult for you to log in as a superuser.
>
> --
> Jeff Jirsa
>
>
> > On Sep 6, 2017, at 11:40 AM, Jon Haddad 
> wrote:
> >
> > I wouldn’t worry about being meticulous about keeping RF = N as the
> cluster grows.  If you had 60 nodes and your auth data was only on 9 you’d
> be completely fine.
> >
> >> On Sep 6, 2017, at 11:36 AM, Cogumelos Maravilha <
> cogumelosmaravi...@sapo.pt> wrote:
> >>
> >> After insert a new node we should:
> >>
> >> ALTER KEYSPACE system_auth WITH REPLICATION = { 'class' : ...
> >> 'replication_factor' : x };
> >>
> >> x = number of nodes in dc
> >>
> >> The default user and password should work:
> >> -u cassandra -p cassandra
> >>
> >> Cheers.
> >>
> >>> On 23-08-2017 11:14, kurt greaves wrote:
> >>> The cassandra user requires QUORUM consistency to be achieved for
> >>> authentication. Normal users only require ONE. I suspect your
> >>> system_auth keyspace has an RF of 1, and the node that owns the
> >>> cassandra users data is down.
> >>>
> >>> Steps to recover:
> >>> 1. Turn off authentication on all the nodes
> >>> 2. Restart the nodes and make sure they are UN
> >>> 3. Alter system_auth to have a higher RF than 1 (3 is probably
> >>> appropriate)
> >>> 4. Turn auth back on and restart
> >>> 5. Create a new user and use that from now on.
> >>>
> >>> ​
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: user-h...@cassandra.apache.org
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
> --
Ben Bromhead
CTO | Instaclustr 
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer


Re: C* 3 node issue -Urgent

2017-09-06 Thread Jeff Jirsa
More explicitly - if you have 60 nodes, setting rf=60 will likely make it very 
difficult for you to log in as a superuser. 

-- 
Jeff Jirsa


> On Sep 6, 2017, at 11:40 AM, Jon Haddad  wrote:
> 
> I wouldn’t worry about being meticulous about keeping RF = N as the cluster 
> grows.  If you had 60 nodes and your auth data was only on 9 you’d be 
> completely fine.  
> 
>> On Sep 6, 2017, at 11:36 AM, Cogumelos Maravilha 
>>  wrote:
>> 
>> After insert a new node we should:
>> 
>> ALTER KEYSPACE system_auth WITH REPLICATION = { 'class' : ...
>> 'replication_factor' : x };
>> 
>> x = number of nodes in dc
>> 
>> The default user and password should work:
>> -u cassandra -p cassandra
>> 
>> Cheers.
>> 
>>> On 23-08-2017 11:14, kurt greaves wrote:
>>> The cassandra user requires QUORUM consistency to be achieved for
>>> authentication. Normal users only require ONE. I suspect your
>>> system_auth keyspace has an RF of 1, and the node that owns the
>>> cassandra users data is down.
>>> 
>>> Steps to recover:
>>> 1. Turn off authentication on all the nodes
>>> 2. Restart the nodes and make sure they are UN
>>> 3. Alter system_auth to have a higher RF than 1 (3 is probably
>>> appropriate)
>>> 4. Turn auth back on and restart
>>> 5. Create a new user and use that from now on.
>>> 
>>> ​
>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: C* 3 node issue -Urgent

2017-09-06 Thread Jon Haddad
I wouldn’t worry about being meticulous about keeping RF = N as the cluster 
grows.  If you had 60 nodes and your auth data was only on 9 you’d be 
completely fine.  

> On Sep 6, 2017, at 11:36 AM, Cogumelos Maravilha  
> wrote:
> 
> After insert a new node we should:
> 
> ALTER KEYSPACE system_auth WITH REPLICATION = { 'class' : ...
> 'replication_factor' : x };
> 
> x = number of nodes in dc
> 
> The default user and password should work:
> -u cassandra -p cassandra
> 
> Cheers.
> 
> On 23-08-2017 11:14, kurt greaves wrote:
>> The cassandra user requires QUORUM consistency to be achieved for
>> authentication. Normal users only require ONE. I suspect your
>> system_auth keyspace has an RF of 1, and the node that owns the
>> cassandra users data is down.
>> 
>> Steps to recover:
>> 1. Turn off authentication on all the nodes
>> 2. Restart the nodes and make sure they are UN
>> 3. Alter system_auth to have a higher RF than 1 (3 is probably
>> appropriate)
>> 4. Turn auth back on and restart
>> 5. Create a new user and use that from now on.
>> 
>> ​
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: C* 3 node issue -Urgent

2017-09-06 Thread Cogumelos Maravilha
After insert a new node we should:

ALTER KEYSPACE system_auth WITH REPLICATION = { 'class' : ...
'replication_factor' : x };

x = number of nodes in dc

The default user and password should work:
-u cassandra -p cassandra

Cheers.

On 23-08-2017 11:14, kurt greaves wrote:
> The cassandra user requires QUORUM consistency to be achieved for
> authentication. Normal users only require ONE. I suspect your
> system_auth keyspace has an RF of 1, and the node that owns the
> cassandra users data is down.
>
> Steps to recover:
> 1. Turn off authentication on all the nodes
> 2. Restart the nodes and make sure they are UN
> 3. Alter system_auth to have a higher RF than 1 (3 is probably
> appropriate)
> 4. Turn auth back on and restart
> 5. Create a new user and use that from now on.
>
> ​


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org