date:20190501

Re: Cassandra taking very long to start and server under heavy load

2019-05-01 Thread Evgeny Inberg

Using a sigle data disk.
Also, it is performing mostly heavy read operations according to the
metrics cillected.

On Wed, 1 May 2019, 20:14 Jeff Jirsa  wrote:

> Do you have multiple data disks?
> Cassandra 6696 changed behavior with multiple data disks to make it safer
> in the situation that one disk fails . It may be copying data to the right
> places on startup, can you see if sstables are being moved on disk?
>
> --
> Jeff Jirsa
>
>
> On May 1, 2019, at 6:04 AM, Evgeny Inberg  wrote:
>
> I have upgraded a Cassandra cluster from version 2.0.x to 3.11.4 going
> trough 2.1.14.
> After the upgrade, noticed that each node is taking about 10-15 minutes to
> start, and server is under a very heavy load.
> Did some digging around and got view leads from the debug log.
> Messages like:
> *Keyspace.java:351 - New replication settings for keyspace system_auth -
> invalidating disk boundary caches *
> *CompactionStrategyManager.java:380 - Recreating compaction strategy -
> disk boundaries are out of date for system_auth.roles.*
>
> This is repeating for all keyspaces.
>
> Any suggestion to check and what might cause this to happen on every
> start?
>
> Thanks!e
>
>

Re: Bootstrapping to Replace a Dead Node vs. Adding a NewNode:Consistency Guarantees

2019-05-01 Thread Alok Dwivedi

Cassandra-2434 is ensuring that when we add new node, it streams data from a 
source that it will replace, once the data has been completely streamed. This 
is explained in detail in the blog post you shared. This ensures that one 
continues to get same consistency as it was before new node was added. So if 
new node D now owns data for token range that originally was owned by replicas 
A, B & C, then this fix ensures that if D streams from A then A no longer owns 
that token range once D has fully joined the cluster. It avoided previous 
issues where it could stream from A but  B later on is the one that no longer 
owns that token range (gives up its range ownership to new node D) and if A 
never had the data then you have kind of lost what you had in B as B no longer 
owns that token range. Hence the fix Cassandra-2434 helps with consistency by 
ensuring that node used for streaming data (A) is the one that no longer owns 
the data so the new node (D) along with other remaining replicas (B & C) should 
now give you same consistency as what you had before D joined the cluster.

Replacing a dead node is different in the sense that node from which replacing 
node will stream data will also continue to remain data owner. So let’s say you 
had A,B,C nodes, C is dead and you replace C with D. Now D can stream from 
either A or B but whatever it choose will also continue to own that token range 
i.e. after D replaces C , we have now A,B & D instead of A , B and C (as C is 
dead).

My understanding is that restriction of single node at a time was applied at 
cluster expansion time to avoid the clashes in token selection which only 
applies at time of extending cluster by adding new node (not when replacing 
dead node). This is what CASSANDRA-7069 addresses.

I think in your case, when replacing more than one nodes, in theory doing it 
serially won’t overcome the issue which I guess  you are highlighting here, 
which is, if I have to stream from A or B how do I cover the case  that A is 
the one with some right data while B is the one with some right data. I think 
streaming will use one source. So whether you do it serially or multiple at a 
time you have that risk (IMO). If I were you, I would do it one node at a time 
to avoid overloading my cluster and then I would run a repair to ensure any 
data I might have missed (because of the source it chose during streaming 
didn’t had it) I sync that with repair. Then I would move on to doing same 
steps with next dead node to be replaced.


Thanks
Alok Dwivedi
Senior Consultant
https://www.instaclustr.com/platform/





From: Fd Habash 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, 2 May 2019 at 08:26
To: "user@cassandra.apache.org" 
Subject: RE: Bootstrapping to Replace a Dead Node vs. Adding a 
NewNode:Consistency Guarantees

Appreciate your response.

As for extending the cluster & keeping the default range movement = true, C* 
won’t allow  me to bootstrap multiples nodes, anyway.

But, the question I’m still posing and have not gotten an answer for, is if fix 
Cassandra-2434 disallows bootstrapping multiple nodes to extend the cluster 
(which I was able to test in my lab cluster), why did it allow to bootstrap 
multiple nodes in the process of replacing dead nodes (no range calc).

This fix forces a node to boostrap from former owner. Is this still the case 
also when bootstrapping when replacing dead node.



Thank you

From: ZAIDI, ASAD A
Sent: Wednesday, May 1, 2019 5:13 PM
To: user@cassandra.apache.org
Subject: RE: Bootstrapping to Replace a Dead Node vs. Adding a 
NewNode:Consistency Guarantees


The article you mentioned here clearly says  “For new users to Cassandra, the 
safest way to add multiple nodes into a cluster is to add them one at a time. 
Stay tuned as I will be following up with another post on bootstrapping.”

When extending cluster it is indeed recommended to go slow & serially. 
Optionally you can use cassandra.consistent.rangemovement=false but you can run 
in getting over streamed data.  Since you’re using release way newer when fixed 
introduced , I assumed you won’t see same behavior as described for the version 
which fix addresses. After adding node , if you won’t get  consistent data, you 
query consistency level should be able to pull consistent data , given you can 
tolerate bit latency until your repair is complete – if you go by 
recommendation i.e. to add one node at a time – you’ll avoid all these nuances .



From: Fd Habash [mailto:fmhab...@gmail.com]
Sent: Wednesday, May 01, 2019 3:12 PM
To: user@cassandra.apache.org
Subject: RE: Bootstrapping to Replace a Dead Node vs. Adding a New 
Node:Consistency Guarantees

Probably, I needed to be clearer in my inquiry ….

I’m investigating a situation where our diagnostic data is telling us that C* 
has lost some of the application data. I mean, getsstables for the data returns 
zero on all nodes in all

RE: Bootstrapping to Replace a Dead Node vs. Adding a NewNode:Consistency Guarantees

2019-05-01 Thread Fd Habash

Appreciate your response. 

As for extending the cluster & keeping the default range movement = true, C* 
won’t allow  me to bootstrap multiples nodes, anyway. 

But, the question I’m still posing and have not gotten an answer for, is if fix 
Cassandra-2434 disallows bootstrapping multiple nodes to extend the cluster 
(which I was able to test in my lab cluster), why did it allow to bootstrap 
multiple nodes in the process of replacing dead nodes (no range calc).

This fix forces a node to boostrap from former owner. Is this still the case 
also when bootstrapping when replacing dead node.



Thank you

From: ZAIDI, ASAD A
Sent: Wednesday, May 1, 2019 5:13 PM
To: user@cassandra.apache.org
Subject: RE: Bootstrapping to Replace a Dead Node vs. Adding a 
NewNode:Consistency Guarantees


The article you mentioned here clearly says  “For new users to Cassandra, the 
safest way to add multiple nodes into a cluster is to add them one at a time. 
Stay tuned as I will be following up with another post on bootstrapping.” 

When extending cluster it is indeed recommended to go slow & serially. 
Optionally you can use cassandra.consistent.rangemovement=false but you can run 
in getting over streamed data.  Since you’re using release way newer when fixed 
introduced , I assumed you won’t see same behavior as described for the version 
which fix addresses. After adding node , if you won’t get  consistent data, you 
query consistency level should be able to pull consistent data , given you can 
tolerate bit latency until your repair is complete – if you go by 
recommendation i.e. to add one node at a time – you’ll avoid all these nuances .



From: Fd Habash [mailto:fmhab...@gmail.com] 
Sent: Wednesday, May 01, 2019 3:12 PM
To: user@cassandra.apache.org
Subject: RE: Bootstrapping to Replace a Dead Node vs. Adding a New 
Node:Consistency Guarantees

Probably, I needed to be clearer in my inquiry ….

I’m investigating a situation where our diagnostic data is telling us that C* 
has lost some of the application data. I mean, getsstables for the data returns 
zero on all nodes in all racks. 

The last pickle article below & Jeff Jirsa had described a situation where 
bootstrapping a node to extend the cluster can loose data if this new node 
bootstraps from a stale SECONDARY replica (node that was offline > hinted 
had-off window). This was fixed in cassandra-2434. 
http://thelastpickle.com/blog/2017/05/23/auto-bootstrapping-part1.html

The article & the Jira above describe bootstrapping when extending a cluster.

I understand replacing a dead node does not involve range movement, but will 
the above Jira fix prevent the bootstrap process when a replacing a dead node 
from using secondary replica?

Thanks 


Thank you

From: Fred Habash
Sent: Wednesday, May 1, 2019 6:50 AM
To: user@cassandra.apache.org
Subject: Re: Bootstrapping to Replace a Dead Node vs. Adding a New 
Node:Consistency Guarantees

Thank you. 

Range movement is one reason this is enforced when adding a new node. But, what 
about forcing a consistent bootstrap i.e. bootstrapping from primary owner of 
the range and not a secondary replica. 

How’s consistent bootstrap enforced when replacing a dead node. 

-
Thank you. 

On Apr 30, 2019, at 7:40 PM, Alok Dwivedi  wrote:
When a new node joins the ring, it needs to own new token ranges. This should 
be unique to the new node and we don’t want to end up in a situation where two 
nodes joining simultaneously can own same range (and ideally evenly 
distributed). Cassandra has this 2 minute wait rule for gossip state to 
propagate before a node is added.  But this on its does not guarantees that 
token ranges can’t overlap. See this ticket for more details 
https://issues.apache.org/jira/browse/CASSANDRA-7069 To overcome this  issue, 
the approach was to only allow one node joining at a time. 
 
When you replace a dead node the new token range selection does not applies as 
the replacing node just owns the token ranges of the dead node. I think that’s 
why the restriction of only replacing one node at a time does not applies in 
this case. 
 
 
Thanks 
Alok Dwivedi
Senior Consultant
https://www.instaclustr.com/platform/
 
 
 
 
 
From: Fd Habash 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, 1 May 2019 at 06:18
To: "user@cassandra.apache.org" 
Subject: Bootstrapping to Replace a Dead Node vs. Adding a New Node: 
Consistency Guarantees 
 
Reviewing the documentation &  based on my testing, using C* 2.2.8, I was not 
able to extend the cluster by adding multiple nodes simultaneously. I got an 
error message …
 
Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while 
cassandra.consistent.rangemovement is true
 
I understand this is to force a node to bootstrap from the former owner of the 
range when adding a node as part of extending the cluster.
 
However, I was able to bootstrap multiple nodes to replace dead nodes. C* did 
not complain about it.

RE: Bootstrapping to Replace a Dead Node vs. Adding a New Node:Consistency Guarantees

2019-05-01 Thread ZAIDI, ASAD A


The article you mentioned here clearly says  “For new users to Cassandra, the 
safest way to add multiple nodes into a cluster is to add them one at a time. 
Stay tuned as I will be following up with another post on bootstrapping.”

When extending cluster it is indeed recommended to go slow & serially. 
Optionally you can use cassandra.consistent.rangemovement=false but you can run 
in getting over streamed data.  Since you’re using release way newer when fixed 
introduced , I assumed you won’t see same behavior as described for the version 
which fix addresses. After adding node , if you won’t get  consistent data, you 
query consistency level should be able to pull consistent data , given you can 
tolerate bit latency until your repair is complete – if you go by 
recommendation i.e. to add one node at a time – you’ll avoid all these nuances .


From: Fd Habash [mailto:fmhab...@gmail.com]
Sent: Wednesday, May 01, 2019 3:12 PM
To: user@cassandra.apache.org
Subject: RE: Bootstrapping to Replace a Dead Node vs. Adding a New 
Node:Consistency Guarantees

Probably, I needed to be clearer in my inquiry ….

I’m investigating a situation where our diagnostic data is telling us that C* 
has lost some of the application data. I mean, getsstables for the data returns 
zero on all nodes in all racks.

The last pickle article below & Jeff Jirsa had described a situation where 
bootstrapping a node to extend the cluster can loose data if this new node 
bootstraps from a stale SECONDARY replica (node that was offline > hinted 
had-off window). This was fixed in cassandra-2434. 
http://thelastpickle.com/blog/2017/05/23/auto-bootstrapping-part1.html

The article & the Jira above describe bootstrapping when extending a cluster.

I understand replacing a dead node does not involve range movement, but will 
the above Jira fix prevent the bootstrap process when a replacing a dead node 
from using secondary replica?

Thanks


Thank you

From: Fred Habash
Sent: Wednesday, May 1, 2019 6:50 AM
To: user@cassandra.apache.org
Subject: Re: Bootstrapping to Replace a Dead Node vs. Adding a New 
Node:Consistency Guarantees

Thank you.

Range movement is one reason this is enforced when adding a new node. But, what 
about forcing a consistent bootstrap i.e. bootstrapping from primary owner of 
the range and not a secondary replica.

How’s consistent bootstrap enforced when replacing a dead node.

-
Thank you.

On Apr 30, 2019, at 7:40 PM, Alok Dwivedi 
mailto:alok.dwiv...@instaclustr.com>> wrote:
When a new node joins the ring, it needs to own new token ranges. This should 
be unique to the new node and we don’t want to end up in a situation where two 
nodes joining simultaneously can own same range (and ideally evenly 
distributed). Cassandra has this 2 minute wait rule for gossip state to 
propagate before a node is added.  But this on its does not guarantees that 
token ranges can’t overlap. See this ticket for more details 
https://issues.apache.org/jira/browse/CASSANDRA-7069
 To overcome this  issue, the approach was to only allow one node joining at a 
time.

When you replace a dead node the new token range selection does not applies as 
the replacing node just owns the token ranges of the dead node. I think that’s 
why the restriction of only replacing one node at a time does not applies in 
this case.


Thanks
Alok Dwivedi
Senior Consultant
https://www.instaclustr.com/platform/





From: Fd Habash mailto:fmhab...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, 1 May 2019 at 06:18
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Bootstrapping to Replace a Dead Node vs. Adding a New Node: 
Consistency Guarantees

Reviewing the documentation &  based on my testing, using C* 2.2.8, I was not 
able to extend the cluster by adding multiple nodes simultaneously. I got an 
error message …

Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while 
cassandra.consistent.rangemovement is true

I understand this is to force a node to bootstrap from the former owner of the

RE: Bootstrapping to Replace a Dead Node vs. Adding a New Node:Consistency Guarantees

2019-05-01 Thread Fd Habash

Probably, I needed to be clearer in my inquiry ….

I’m investigating a situation where our diagnostic data is telling us that C* 
has lost some of the application data. I mean, getsstables for the data returns 
zero on all nodes in all racks. 

The last pickle article below & Jeff Jirsa had described a situation where 
bootstrapping a node to extend the cluster can loose data if this new node 
bootstraps from a stale SECONDARY replica (node that was offline > hinted 
had-off window). This was fixed in cassandra-2434. 
http://thelastpickle.com/blog/2017/05/23/auto-bootstrapping-part1.html

The article & the Jira above describe bootstrapping when extending a cluster.

I understand replacing a dead node does not involve range movement, but will 
the above Jira fix prevent the bootstrap process when a replacing a dead node 
from using secondary replica?

Thanks 


Thank you

From: Fred Habash
Sent: Wednesday, May 1, 2019 6:50 AM
To: user@cassandra.apache.org
Subject: Re: Bootstrapping to Replace a Dead Node vs. Adding a New 
Node:Consistency Guarantees

Thank you. 

Range movement is one reason this is enforced when adding a new node. But, what 
about forcing a consistent bootstrap i.e. bootstrapping from primary owner of 
the range and not a secondary replica. 

How’s consistent bootstrap enforced when replacing a dead node. 

-
Thank you. 

On Apr 30, 2019, at 7:40 PM, Alok Dwivedi  wrote:
When a new node joins the ring, it needs to own new token ranges. This should 
be unique to the new node and we don’t want to end up in a situation where two 
nodes joining simultaneously can own same range (and ideally evenly 
distributed). Cassandra has this 2 minute wait rule for gossip state to 
propagate before a node is added.  But this on its does not guarantees that 
token ranges can’t overlap. See this ticket for more details 
https://issues.apache.org/jira/browse/CASSANDRA-7069 To overcome this  issue, 
the approach was to only allow one node joining at a time. 
 
When you replace a dead node the new token range selection does not applies as 
the replacing node just owns the token ranges of the dead node. I think that’s 
why the restriction of only replacing one node at a time does not applies in 
this case. 
 
 
Thanks 
Alok Dwivedi
Senior Consultant
https://www.instaclustr.com/platform/
 
 
 
 
 
From: Fd Habash 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, 1 May 2019 at 06:18
To: "user@cassandra.apache.org" 
Subject: Bootstrapping to Replace a Dead Node vs. Adding a New Node: 
Consistency Guarantees 
 
Reviewing the documentation &  based on my testing, using C* 2.2.8, I was not 
able to extend the cluster by adding multiple nodes simultaneously. I got an 
error message …
 
Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while 
cassandra.consistent.rangemovement is true
 
I understand this is to force a node to bootstrap from the former owner of the 
range when adding a node as part of extending the cluster.
 
However, I was able to bootstrap multiple nodes to replace dead nodes. C* did 
not complain about it.
 
Is consistent range movement & the guarantee it offers to bootstrap from 
primary range owner not applicable when bootstrapping to replace dead nodes? 
 

Thank you

Re: Joining a node to the cluster when streaming hosts die

2019-05-01 Thread Jeff Jirsa

There is "resumable bootstrap" in 3.0 and newer (maybe 2.2 and newer?), but
I've never used it and have no opinion about whether or not I'd trust it
myself. I'd personally stop the joining instance, clear the data, and start
again.

On Wed, May 1, 2019 at 10:44 AM Nick Hatfield 
wrote:

> Hello,
>
>
>
> Scenario:
>
> I am joining a new host to my cluster and while joining, one of the hosts
> that is streaming data to it, fell over from diskspace filling up. Once the
> node was marked DN by the rest of the cluster, I gracefully restarted the
> node. I can see all appears to be fine when it came back online, but the
> data it was streaming never re-started.
>
>
>
>
>
> Question:
>
> Is there an easy way, other than stopping and restarting the entire join
> process over, to get a node to re-initiate data that it is responsible for
> streaming after it dies and comes back online?
>
>
>
> Thanks,
>
>
>

RE: cassandra node was put down with oom error

2019-05-01 Thread ZAIDI, ASAD A

Is there any chance partition size has grown over time and taking much 
allocated memory - if yes,  that could also affect compaction thread as they'll 
too take more heap and kept in heap longer - leaving less for other processes . 
You can check partition size if they are manageable using nodetool tablestats - 
ideally size  should be even across nodes. you can check if # of concurrent 
compactor (nodetool) are optimal and if throughput is capped/ throttled (with 
nodetool utility). See if repair is unusually  running longer  taking much 
resources  i.e. cpu/heap  etc.  check if storage is not acting up (using iostat 
-x , look at await column). See if there is bursty workload /batches are 
hitting nodes tipping over the instance  using nodetool tpstats  (look at 
native-transport-requests - all time blocked column) .. above should give some 
clue what is going around



-Original Message-
From: Mia [mailto:yeomii...@gmail.com] 
Sent: Wednesday, May 01, 2019 5:47 AM
To: user@cassandra.apache.org
Subject: Re: cassandra node was put down with oom error

Hello, Ayub.

I'm using apache cassandra, not dse edition. So I have never used the dse 
search feature.
In my case, all the nodes of the cluster have the same problem. 

Thanks.

On 2019/05/01 06:13:06, Ayub M  wrote: 
> Do you have search on the same nodes or is it only cassandra. In my 
> case it was due to a memory leak bug in dse search that consumed more 
> memory resulting in oom.
> 
> On Tue, Apr 30, 2019, 2:58 AM yeomii...@gmail.com 
> 
> wrote:
> 
> > Hello,
> >
> > I'm suffering from similar problem with OSS cassandra version3.11.3.
> > My cassandra cluster have been running for longer than 1 years and 
> > there was no problem until this year.
> > The cluster is write-intensive, consists of 70 nodes, and all rows 
> > have 2 hr TTL.
> > The only change is the read consistency from QUORUM to ONE. (I 
> > cannot revert this change because of the read latency) Below is my 
> > compaction strategy.
> > ```
> > compaction = {'class':
> > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> > 'compaction_window_size': '3', 'compaction_window_unit': 'MINUTES',
> > 'enabled': 'true', 'max_threshold': '32', 'min_threshold': '4',
> > 'tombstone_compaction_interval': '60', 'tombstone_threshold': '0.2',
> > 'unchecked_tombstone_compaction': 'false'} ``` I've tried rolling 
> > restarting the cluster several times, but the memory usage of 
> > cassandra process always keeps going high.
> > I also tried Native Memory Tracking, but it only measured less 
> > memory usage than the system mesaures (RSS in 
> > /proc/{cassandra-pid}/status)
> >
> > Is there any way that I could figure out the cause of this problem?
> >
> >
> > On 2019/01/26 20:53:26, Jeff Jirsa  wrote:
> > > You’re running DSE so the OSS list may not be much help. Datastax 
> > > May
> > have more insight
> > >
> > > In open source, the only things offheap that vary significantly 
> > > are
> > bloom filters and compression offsets - both scale with disk space, 
> > and both increase during compaction. Large STCS compaction can cause 
> > pretty meaningful allocations for these. Also, if you have an 
> > unusually low compression chunk size or a very low bloom filter FP 
> > ratio, those will be larger.
> > >
> > >
> > > --
> > > Jeff Jirsa
> > >
> > >
> > > > On Jan 26, 2019, at 12:11 PM, Ayub M  wrote:
> > > >
> > > > Cassandra node went down due to OOM, and checking the 
> > > > /var/log/message
> > I see below.
> > > >
> > > > ```
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java invoked oom-killer:
> > gfp_mask=0x280da, order=0, oom_score_adj=0
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java cpuset=/ 
> > > > mems_allowed=0 
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA: 1*4kB (U) 
> > > > 0*8kB
> > 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 
> > 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA32: 
> > > > 1294*4kB (UM)
> > 932*8kB (UEM) 897*16kB (UEM) 483*32kB (UEM) 224*64kB (UEM) 114*128kB 
> > (UEM) 41*256kB (UEM) 12*512kB (UEM) 7*1024kB (UE
> > > > M) 2*2048kB (EM) 35*4096kB (UM) = 242632kB Jan 23 20:07:17 
> > > > ip-xxx-xxx-xxx-xxx kernel: Node 0 Normal: 5319*4kB
> > (UE) 3233*8kB (UEM) 960*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 
> > 0*512kB 0*1024kB 0*2048kB 0*4096kB = 62500kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 
> > > > hugepages_total=0
> > hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 
> > > > hugepages_total=0
> > hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 38109 total pagecache 
> > > > pages Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages in swap 
> > > > cache Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Swap cache 
> > > > stats: add 0,
> > delete 0, find 0/0
> > > > Jan 23 20:07:17

Re: Bootstrapping to Replace a Dead Node vs. Adding a New Node: Consistency Guarantees

2019-05-01 Thread Fred Habash

I, probably, should've been clearer in my inquiry ...

I'm investigating a scenario where our diagnostic data is tell us that a
small portion of application data has been lost. I mean, getsstables for
the keys returns zero on all cluster nodes.

The last pickle article below (which includes a case scenario described by
Jeff Jirsa) suggests possible data loss case when bootstrapping a new node
to extend the cluster. The new node may bootstrap from a stale SECONDARY
replica. A fix was made in Cassandra-2434.

However, the article, the Jira, and Jeff's example all describe the
scenario when extending a cluster.

I understand replacing a dead node does not involve range movement. But,
will the above fix force the bootstrap happening while replacing a dead
node from streaming the data from a secondary (potentially) stale node? In
other words, that fact that I was able to bootstrap multiple dead nodes,
does it mean it is safe to do so?

http://thelastpickle.com/blog/2017/05/23/auto-bootstrapping-part1.html

Thanks

On Tue, Apr 30, 2019 at 7:41 PM Alok Dwivedi 
wrote:

> When a new node joins the ring, it needs to own new token ranges. This
> should be unique to the new node and we don’t want to end up in a situation
> where two nodes joining simultaneously can own same range (and ideally
> evenly distributed). Cassandra has this 2 minute wait rule for gossip state
> to propagate before a node is added.  But this on its does not guarantees
> that token ranges can’t overlap. See this ticket for more details
> https://issues.apache.org/jira/browse/CASSANDRA-7069 To overcome this
> issue, the approach was to only allow one node joining at a time.
>
>
>
> When you replace a dead node the new token range selection does not
> applies as the replacing node just owns the token ranges of the dead node.
> I think that’s why the restriction of only replacing one node at a time
> does not applies in this case.
>
>
>
>
>
> Thanks
>
> Alok Dwivedi
>
> Senior Consultant
>
> https://www.instaclustr.com/platform/
>
>
>
>
>
>
>
>
>
>
>
> *From: *Fd Habash 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Wednesday, 1 May 2019 at 06:18
> *To: *"user@cassandra.apache.org" 
> *Subject: *Bootstrapping to Replace a Dead Node vs. Adding a New Node:
> Consistency Guarantees
>
>
>
> Reviewing the documentation &  based on my testing, using C* 2.2.8, I was
> not able to extend the cluster by adding multiple nodes simultaneously. I
> got an error message …
>
>
>
> Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while
> cassandra.consistent.rangemovement is true
>
>
>
> I understand this is to force a node to bootstrap from the former owner of
> the range when adding a node as part of extending the cluster.
>
>
>
> However, I was able to bootstrap multiple nodes to replace dead nodes. C*
> did not complain about it.
>
>
>
> Is consistent range movement & the guarantee it offers to bootstrap from
> primary range owner not applicable when bootstrapping to replace dead
> nodes?
>
>
>
> 
> Thank you
>
>
>

-- 

Thank you

Re: Cassandra taking very long to start and server under heavy load

2019-05-01 Thread Jeff Jirsa

Do you have multiple data disks? 
Cassandra 6696 changed behavior with multiple data disks to make it safer in 
the situation that one disk fails . It may be copying data to the right places 
on startup, can you see if sstables are being moved on disk? 

-- 
Jeff Jirsa


> On May 1, 2019, at 6:04 AM, Evgeny Inberg  wrote:
> 
> I have upgraded a Cassandra cluster from version 2.0.x to 3.11.4 going trough 
> 2.1.14. 
> After the upgrade, noticed that each node is taking about 10-15 minutes to 
> start, and server is under a very heavy load.
> Did some digging around and got view leads from the debug log. 
> Messages like:
> Keyspace.java:351 - New replication settings for keyspace system_auth - 
> invalidating disk boundary caches 
> CompactionStrategyManager.java:380 - Recreating compaction strategy - disk 
> boundaries are out of date for system_auth.roles.
> 
> This is repeating for all keyspaces. 
> 
> Any suggestion to check and what might cause this to happen on every start? 
> 
> Thanks!

Re: cassandra node was put down with oom error

2019-05-01 Thread Steve Lacerda

First, you have to find out where the memory is going. So, you can use the
mbeans in jconsole or something like that. You'll have to look at different
caches and offheap in cache and metrics types. Once you've figured that
out, then you can start working on tuning things. Yes, your heap is 32G,
add about 2G for jvm classes, and other jvm stuff. Then chunk cache,
counter cache, key cache, bloom filters, and whatever else and voila the
heap is gone.

On Mon, Apr 29, 2019 at 11:58 PM yeomii...@gmail.com 
wrote:

> Hello,
>
> I'm suffering from similar problem with OSS cassandra version3.11.3.
> My cassandra cluster have been running for longer than 1 years and there
> was no problem until this year.
> The cluster is write-intensive, consists of 70 nodes, and all rows have 2
> hr TTL.
> The only change is the read consistency from QUORUM to ONE. (I cannot
> revert this change because of the read latency)
> Below is my compaction strategy.
> ```
> compaction = {'class':
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> 'compaction_window_size': '3', 'compaction_window_unit': 'MINUTES',
> 'enabled': 'true', 'max_threshold': '32', 'min_threshold': '4',
> 'tombstone_compaction_interval': '60', 'tombstone_threshold': '0.2',
> 'unchecked_tombstone_compaction': 'false'}
> ```
> I've tried rolling restarting the cluster several times,
> but the memory usage of cassandra process always keeps going high.
> I also tried Native Memory Tracking, but it only measured less memory
> usage than the system mesaures (RSS in /proc/{cassandra-pid}/status)
>
> Is there any way that I could figure out the cause of this problem?
>
>
> On 2019/01/26 20:53:26, Jeff Jirsa  wrote:
> > You’re running DSE so the OSS list may not be much help. Datastax May
> have more insight
> >
> > In open source, the only things offheap that vary significantly are
> bloom filters and compression offsets - both scale with disk space, and
> both increase during compaction. Large STCS compaction can cause pretty
> meaningful allocations for these. Also, if you have an unusually low
> compression chunk size or a very low bloom filter FP ratio, those will be
> larger.
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Jan 26, 2019, at 12:11 PM, Ayub M  wrote:
> > >
> > > Cassandra node went down due to OOM, and checking the /var/log/message
> I see below.
> > >
> > > ```
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java invoked oom-killer:
> gfp_mask=0x280da, order=0, oom_score_adj=0
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java cpuset=/ mems_allowed=0
> > > 
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA: 1*4kB (U) 0*8kB
> 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U)
> 1*2048kB (M) 3*4096kB (M) = 15908kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA32: 1294*4kB (UM)
> 932*8kB (UEM) 897*16kB (UEM) 483*32kB (UEM) 224*64kB (UEM) 114*128kB (UEM)
> 41*256kB (UEM) 12*512kB (UEM) 7*1024kB (UE
> > > M) 2*2048kB (EM) 35*4096kB (UM) = 242632kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 Normal: 5319*4kB
> (UE) 3233*8kB (UEM) 960*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
> 0*1024kB 0*2048kB 0*4096kB = 62500kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total=0
> hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total=0
> hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 38109 total pagecache pages
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages in swap cache
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Swap cache stats: add 0,
> delete 0, find 0/0
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Free swap  = 0kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Total swap = 0kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 16394647 pages RAM
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages HighMem/MovableOnly
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 310559 pages reserved
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ pid ]   uid  tgid
> total_vm  rss nr_ptes swapents oom_score_adj name
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2634] 0  2634
> 41614  326  820 0 systemd-journal
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2690] 0  2690
> 29793  541  270 0 lvmetad
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2710] 0  2710
> 11892  762  250 -1000 systemd-udevd
> > > .
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [13774] 0 13774
>  45977897729 4290 0 Scan Factory
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14506] 0 14506
> 21628 5340  240 0 macompatsvc
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14586] 0 14586
> 21628 5340  240 0 macompatsvc
> > > Jan 23

Cassandra taking very long to start and server under heavy load

2019-05-01 Thread Evgeny Inberg

I have upgraded a Cassandra cluster from version 2.0.x to 3.11.4 going
trough 2.1.14.
After the upgrade, noticed that each node is taking about 10-15 minutes to
start, and server is under a very heavy load.
Did some digging around and got view leads from the debug log.
Messages like:
*Keyspace.java:351 - New replication settings for keyspace system_auth -
invalidating disk boundary caches *
*CompactionStrategyManager.java:380 - Recreating compaction strategy - disk
boundaries are out of date for system_auth.roles.*

This is repeating for all keyspaces.

Any suggestion to check and what might cause this to happen on every start?

Thanks!

Re: cassandra node was put down with oom error

2019-05-01 Thread Sandeep Nethi

I think 3.11.3 has some bug and which can cause OOMs on nodes with full
repairs. Just check if there is any correlation with ooms and repair
process.

Thanks,
Sandeep



On Wed, 1 May 2019 at 11:02 PM, Mia  wrote:

> Hi Sandeep.
>
> I'm not running any manual repair and I think there is no running full
> repair.
> I cannot see any log about repair in system.log these days.
> Does full repair have anything to do with using large amount of memory?
>
> Thanks.
>
> On 2019/05/01 10:47:50, Sandeep Nethi  wrote:
> > Are you by any chance running the full repair on these nodes?
> >
> > Thanks,
> > Sandeep
> >
> > On Wed, 1 May 2019 at 10:46 PM, Mia  wrote:
> >
> > > Hello, Ayub.
> > >
> > > I'm using apache cassandra, not dse edition. So I have never used the
> dse
> > > search feature.
> > > In my case, all the nodes of the cluster have the same problem.
> > >
> > > Thanks.
> > >
> > > On 2019/05/01 06:13:06, Ayub M  wrote:
> > > > Do you have search on the same nodes or is it only cassandra. In my
> case
> > > it
> > > > was due to a memory leak bug in dse search that consumed more memory
> > > > resulting in oom.
> > > >
> > > > On Tue, Apr 30, 2019, 2:58 AM yeomii...@gmail.com <
> yeomii...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I'm suffering from similar problem with OSS cassandra
> version3.11.3.
> > > > > My cassandra cluster have been running for longer than 1 years and
> > > there
> > > > > was no problem until this year.
> > > > > The cluster is write-intensive, consists of 70 nodes, and all rows
> > > have 2
> > > > > hr TTL.
> > > > > The only change is the read consistency from QUORUM to ONE. (I
> cannot
> > > > > revert this change because of the read latency)
> > > > > Below is my compaction strategy.
> > > > > ```
> > > > > compaction = {'class':
> > > > > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> > > > > 'compaction_window_size': '3', 'compaction_window_unit': 'MINUTES',
> > > > > 'enabled': 'true', 'max_threshold': '32', 'min_threshold': '4',
> > > > > 'tombstone_compaction_interval': '60', 'tombstone_threshold':
> '0.2',
> > > > > 'unchecked_tombstone_compaction': 'false'}
> > > > > ```
> > > > > I've tried rolling restarting the cluster several times,
> > > > > but the memory usage of cassandra process always keeps going high.
> > > > > I also tried Native Memory Tracking, but it only measured less
> memory
> > > > > usage than the system mesaures (RSS in
> /proc/{cassandra-pid}/status)
> > > > >
> > > > > Is there any way that I could figure out the cause of this problem?
> > > > >
> > > > >
> > > > > On 2019/01/26 20:53:26, Jeff Jirsa  wrote:
> > > > > > You’re running DSE so the OSS list may not be much help.
> Datastax May
> > > > > have more insight
> > > > > >
> > > > > > In open source, the only things offheap that vary significantly
> are
> > > > > bloom filters and compression offsets - both scale with disk
> space, and
> > > > > both increase during compaction. Large STCS compaction can cause
> pretty
> > > > > meaningful allocations for these. Also, if you have an unusually
> low
> > > > > compression chunk size or a very low bloom filter FP ratio, those
> will
> > > be
> > > > > larger.
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jeff Jirsa
> > > > > >
> > > > > >
> > > > > > > On Jan 26, 2019, at 12:11 PM, Ayub M  wrote:
> > > > > > >
> > > > > > > Cassandra node went down due to OOM, and checking the
> > > /var/log/message
> > > > > I see below.
> > > > > > >
> > > > > > > ```
> > > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java invoked
> oom-killer:
> > > > > gfp_mask=0x280da, order=0, oom_score_adj=0
> > > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java cpuset=/
> > > mems_allowed=0
> > > > > > > 
> > > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA: 1*4kB
> (U)
> > > 0*8kB
> > > > > 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB
> 1*1024kB
> > > (U)
> > > > > 1*2048kB (M) 3*4096kB (M) = 15908kB
> > > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA32:
> 1294*4kB
> > > (UM)
> > > > > 932*8kB (UEM) 897*16kB (UEM) 483*32kB (UEM) 224*64kB (UEM)
> 114*128kB
> > > (UEM)
> > > > > 41*256kB (UEM) 12*512kB (UEM) 7*1024kB (UE
> > > > > > > M) 2*2048kB (EM) 35*4096kB (UM) = 242632kB
> > > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 Normal:
> 5319*4kB
> > > > > (UE) 3233*8kB (UEM) 960*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB
> 0*512kB
> > > > > 0*1024kB 0*2048kB 0*4096kB = 62500kB
> > > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0
> hugepages_total=0
> > > > > hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> > > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0
> hugepages_total=0
> > > > > hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 38109 total
> pagecache
> > > pages
> > > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages in swap
> cache

Re: cassandra node was put down with oom error

2019-05-01 Thread Mia

Hi Sandeep.

I'm not running any manual repair and I think there is no running full repair.
I cannot see any log about repair in system.log these days.
Does full repair have anything to do with using large amount of memory?

Thanks.

On 2019/05/01 10:47:50, Sandeep Nethi  wrote: 
> Are you by any chance running the full repair on these nodes?
> 
> Thanks,
> Sandeep
> 
> On Wed, 1 May 2019 at 10:46 PM, Mia  wrote:
> 
> > Hello, Ayub.
> >
> > I'm using apache cassandra, not dse edition. So I have never used the dse
> > search feature.
> > In my case, all the nodes of the cluster have the same problem.
> >
> > Thanks.
> >
> > On 2019/05/01 06:13:06, Ayub M  wrote:
> > > Do you have search on the same nodes or is it only cassandra. In my case
> > it
> > > was due to a memory leak bug in dse search that consumed more memory
> > > resulting in oom.
> > >
> > > On Tue, Apr 30, 2019, 2:58 AM yeomii...@gmail.com 
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I'm suffering from similar problem with OSS cassandra version3.11.3.
> > > > My cassandra cluster have been running for longer than 1 years and
> > there
> > > > was no problem until this year.
> > > > The cluster is write-intensive, consists of 70 nodes, and all rows
> > have 2
> > > > hr TTL.
> > > > The only change is the read consistency from QUORUM to ONE. (I cannot
> > > > revert this change because of the read latency)
> > > > Below is my compaction strategy.
> > > > ```
> > > > compaction = {'class':
> > > > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> > > > 'compaction_window_size': '3', 'compaction_window_unit': 'MINUTES',
> > > > 'enabled': 'true', 'max_threshold': '32', 'min_threshold': '4',
> > > > 'tombstone_compaction_interval': '60', 'tombstone_threshold': '0.2',
> > > > 'unchecked_tombstone_compaction': 'false'}
> > > > ```
> > > > I've tried rolling restarting the cluster several times,
> > > > but the memory usage of cassandra process always keeps going high.
> > > > I also tried Native Memory Tracking, but it only measured less memory
> > > > usage than the system mesaures (RSS in /proc/{cassandra-pid}/status)
> > > >
> > > > Is there any way that I could figure out the cause of this problem?
> > > >
> > > >
> > > > On 2019/01/26 20:53:26, Jeff Jirsa  wrote:
> > > > > You’re running DSE so the OSS list may not be much help. Datastax May
> > > > have more insight
> > > > >
> > > > > In open source, the only things offheap that vary significantly are
> > > > bloom filters and compression offsets - both scale with disk space, and
> > > > both increase during compaction. Large STCS compaction can cause pretty
> > > > meaningful allocations for these. Also, if you have an unusually low
> > > > compression chunk size or a very low bloom filter FP ratio, those will
> > be
> > > > larger.
> > > > >
> > > > >
> > > > > --
> > > > > Jeff Jirsa
> > > > >
> > > > >
> > > > > > On Jan 26, 2019, at 12:11 PM, Ayub M  wrote:
> > > > > >
> > > > > > Cassandra node went down due to OOM, and checking the
> > /var/log/message
> > > > I see below.
> > > > > >
> > > > > > ```
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java invoked oom-killer:
> > > > gfp_mask=0x280da, order=0, oom_score_adj=0
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java cpuset=/
> > mems_allowed=0
> > > > > > 
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA: 1*4kB (U)
> > 0*8kB
> > > > 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB
> > (U)
> > > > 1*2048kB (M) 3*4096kB (M) = 15908kB
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA32: 1294*4kB
> > (UM)
> > > > 932*8kB (UEM) 897*16kB (UEM) 483*32kB (UEM) 224*64kB (UEM) 114*128kB
> > (UEM)
> > > > 41*256kB (UEM) 12*512kB (UEM) 7*1024kB (UE
> > > > > > M) 2*2048kB (EM) 35*4096kB (UM) = 242632kB
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 Normal: 5319*4kB
> > > > (UE) 3233*8kB (UEM) 960*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
> > > > 0*1024kB 0*2048kB 0*4096kB = 62500kB
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total=0
> > > > hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total=0
> > > > hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 38109 total pagecache
> > pages
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages in swap cache
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Swap cache stats: add 0,
> > > > delete 0, find 0/0
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Free swap  = 0kB
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Total swap = 0kB
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 16394647 pages RAM
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages
> > HighMem/MovableOnly
> > > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 310559 pages reserved
> > > > > >

Re: cassandra node was put down with oom error

2019-05-01 Thread Sandeep Nethi

Are you by any chance running the full repair on these nodes?

Thanks,
Sandeep

On Wed, 1 May 2019 at 10:46 PM, Mia  wrote:

> Hello, Ayub.
>
> I'm using apache cassandra, not dse edition. So I have never used the dse
> search feature.
> In my case, all the nodes of the cluster have the same problem.
>
> Thanks.
>
> On 2019/05/01 06:13:06, Ayub M  wrote:
> > Do you have search on the same nodes or is it only cassandra. In my case
> it
> > was due to a memory leak bug in dse search that consumed more memory
> > resulting in oom.
> >
> > On Tue, Apr 30, 2019, 2:58 AM yeomii...@gmail.com 
> > wrote:
> >
> > > Hello,
> > >
> > > I'm suffering from similar problem with OSS cassandra version3.11.3.
> > > My cassandra cluster have been running for longer than 1 years and
> there
> > > was no problem until this year.
> > > The cluster is write-intensive, consists of 70 nodes, and all rows
> have 2
> > > hr TTL.
> > > The only change is the read consistency from QUORUM to ONE. (I cannot
> > > revert this change because of the read latency)
> > > Below is my compaction strategy.
> > > ```
> > > compaction = {'class':
> > > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> > > 'compaction_window_size': '3', 'compaction_window_unit': 'MINUTES',
> > > 'enabled': 'true', 'max_threshold': '32', 'min_threshold': '4',
> > > 'tombstone_compaction_interval': '60', 'tombstone_threshold': '0.2',
> > > 'unchecked_tombstone_compaction': 'false'}
> > > ```
> > > I've tried rolling restarting the cluster several times,
> > > but the memory usage of cassandra process always keeps going high.
> > > I also tried Native Memory Tracking, but it only measured less memory
> > > usage than the system mesaures (RSS in /proc/{cassandra-pid}/status)
> > >
> > > Is there any way that I could figure out the cause of this problem?
> > >
> > >
> > > On 2019/01/26 20:53:26, Jeff Jirsa  wrote:
> > > > You’re running DSE so the OSS list may not be much help. Datastax May
> > > have more insight
> > > >
> > > > In open source, the only things offheap that vary significantly are
> > > bloom filters and compression offsets - both scale with disk space, and
> > > both increase during compaction. Large STCS compaction can cause pretty
> > > meaningful allocations for these. Also, if you have an unusually low
> > > compression chunk size or a very low bloom filter FP ratio, those will
> be
> > > larger.
> > > >
> > > >
> > > > --
> > > > Jeff Jirsa
> > > >
> > > >
> > > > > On Jan 26, 2019, at 12:11 PM, Ayub M  wrote:
> > > > >
> > > > > Cassandra node went down due to OOM, and checking the
> /var/log/message
> > > I see below.
> > > > >
> > > > > ```
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java invoked oom-killer:
> > > gfp_mask=0x280da, order=0, oom_score_adj=0
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java cpuset=/
> mems_allowed=0
> > > > > 
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA: 1*4kB (U)
> 0*8kB
> > > 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB
> (U)
> > > 1*2048kB (M) 3*4096kB (M) = 15908kB
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA32: 1294*4kB
> (UM)
> > > 932*8kB (UEM) 897*16kB (UEM) 483*32kB (UEM) 224*64kB (UEM) 114*128kB
> (UEM)
> > > 41*256kB (UEM) 12*512kB (UEM) 7*1024kB (UE
> > > > > M) 2*2048kB (EM) 35*4096kB (UM) = 242632kB
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 Normal: 5319*4kB
> > > (UE) 3233*8kB (UEM) 960*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
> > > 0*1024kB 0*2048kB 0*4096kB = 62500kB
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total=0
> > > hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total=0
> > > hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 38109 total pagecache
> pages
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages in swap cache
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Swap cache stats: add 0,
> > > delete 0, find 0/0
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Free swap  = 0kB
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Total swap = 0kB
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 16394647 pages RAM
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages
> HighMem/MovableOnly
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 310559 pages reserved
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ pid ]   uid  tgid
> > > total_vm  rss nr_ptes swapents oom_score_adj name
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2634] 0  2634
> > > 41614  326  820 0 systemd-journal
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2690] 0  2690
> > > 29793  541  270 0 lvmetad
> > > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2710] 0  2710
> > > 11892  762  25

Re: Bootstrapping to Replace a Dead Node vs. Adding a New Node: Consistency Guarantees

2019-05-01 Thread Fred Habash

Thank you. 

Range movement is one reason this is enforced when adding a new node. But, what 
about forcing a consistent bootstrap i.e. bootstrapping from primary owner of 
the range and not a secondary replica. 

How’s consistent bootstrap enforced when replacing a dead node. 


-
Thank you. 

> On Apr 30, 2019, at 7:40 PM, Alok Dwivedi  
> wrote:
> 
> When a new node joins the ring, it needs to own new token ranges. This should 
> be unique to the new node and we don’t want to end up in a situation where 
> two nodes joining simultaneously can own same range (and ideally evenly 
> distributed). Cassandra has this 2 minute wait rule for gossip state to 
> propagate before a node is added.  But this on its does not guarantees that 
> token ranges can’t overlap. See this ticket for more details 
> https://issues.apache.org/jira/browse/CASSANDRA-7069 To overcome this  issue, 
> the approach was to only allow one node joining at a time. 
>  
> When you replace a dead node the new token range selection does not applies 
> as the replacing node just owns the token ranges of the dead node. I think 
> that’s why the restriction of only replacing one node at a time does not 
> applies in this case. 
>  
>  
> Thanks
> Alok Dwivedi
> Senior Consultant
> https://www.instaclustr.com/platform/
>  
>  
>  
>  
>  
> From: Fd Habash 
> Reply-To: "user@cassandra.apache.org" 
> Date: Wednesday, 1 May 2019 at 06:18
> To: "user@cassandra.apache.org" 
> Subject: Bootstrapping to Replace a Dead Node vs. Adding a New Node: 
> Consistency Guarantees
>  
> Reviewing the documentation &  based on my testing, using C* 2.2.8, I was not 
> able to extend the cluster by adding multiple nodes simultaneously. I got an 
> error message …
>  
> Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while 
> cassandra.consistent.rangemovement is true
>  
> I understand this is to force a node to bootstrap from the former owner of 
> the range when adding a node as part of extending the cluster.
>  
> However, I was able to bootstrap multiple nodes to replace dead nodes. C* did 
> not complain about it.
>  
> Is consistent range movement & the guarantee it offers to bootstrap from 
> primary range owner not applicable when bootstrapping to replace dead nodes?
>  
> 
> Thank you
>

Re: cassandra node was put down with oom error

2019-05-01 Thread Mia

Hello, Ayub.

I'm using apache cassandra, not dse edition. So I have never used the dse 
search feature.
In my case, all the nodes of the cluster have the same problem. 

Thanks.

On 2019/05/01 06:13:06, Ayub M  wrote: 
> Do you have search on the same nodes or is it only cassandra. In my case it
> was due to a memory leak bug in dse search that consumed more memory
> resulting in oom.
> 
> On Tue, Apr 30, 2019, 2:58 AM yeomii...@gmail.com 
> wrote:
> 
> > Hello,
> >
> > I'm suffering from similar problem with OSS cassandra version3.11.3.
> > My cassandra cluster have been running for longer than 1 years and there
> > was no problem until this year.
> > The cluster is write-intensive, consists of 70 nodes, and all rows have 2
> > hr TTL.
> > The only change is the read consistency from QUORUM to ONE. (I cannot
> > revert this change because of the read latency)
> > Below is my compaction strategy.
> > ```
> > compaction = {'class':
> > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> > 'compaction_window_size': '3', 'compaction_window_unit': 'MINUTES',
> > 'enabled': 'true', 'max_threshold': '32', 'min_threshold': '4',
> > 'tombstone_compaction_interval': '60', 'tombstone_threshold': '0.2',
> > 'unchecked_tombstone_compaction': 'false'}
> > ```
> > I've tried rolling restarting the cluster several times,
> > but the memory usage of cassandra process always keeps going high.
> > I also tried Native Memory Tracking, but it only measured less memory
> > usage than the system mesaures (RSS in /proc/{cassandra-pid}/status)
> >
> > Is there any way that I could figure out the cause of this problem?
> >
> >
> > On 2019/01/26 20:53:26, Jeff Jirsa  wrote:
> > > You’re running DSE so the OSS list may not be much help. Datastax May
> > have more insight
> > >
> > > In open source, the only things offheap that vary significantly are
> > bloom filters and compression offsets - both scale with disk space, and
> > both increase during compaction. Large STCS compaction can cause pretty
> > meaningful allocations for these. Also, if you have an unusually low
> > compression chunk size or a very low bloom filter FP ratio, those will be
> > larger.
> > >
> > >
> > > --
> > > Jeff Jirsa
> > >
> > >
> > > > On Jan 26, 2019, at 12:11 PM, Ayub M  wrote:
> > > >
> > > > Cassandra node went down due to OOM, and checking the /var/log/message
> > I see below.
> > > >
> > > > ```
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java invoked oom-killer:
> > gfp_mask=0x280da, order=0, oom_score_adj=0
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java cpuset=/ mems_allowed=0
> > > > 
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA: 1*4kB (U) 0*8kB
> > 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U)
> > 1*2048kB (M) 3*4096kB (M) = 15908kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA32: 1294*4kB (UM)
> > 932*8kB (UEM) 897*16kB (UEM) 483*32kB (UEM) 224*64kB (UEM) 114*128kB (UEM)
> > 41*256kB (UEM) 12*512kB (UEM) 7*1024kB (UE
> > > > M) 2*2048kB (EM) 35*4096kB (UM) = 242632kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 Normal: 5319*4kB
> > (UE) 3233*8kB (UEM) 960*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
> > 0*1024kB 0*2048kB 0*4096kB = 62500kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total=0
> > hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total=0
> > hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 38109 total pagecache pages
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages in swap cache
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Swap cache stats: add 0,
> > delete 0, find 0/0
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Free swap  = 0kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Total swap = 0kB
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 16394647 pages RAM
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages HighMem/MovableOnly
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 310559 pages reserved
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ pid ]   uid  tgid
> > total_vm  rss nr_ptes swapents oom_score_adj name
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2634] 0  2634
> > 41614  326  820 0 systemd-journal
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2690] 0  2690
> > 29793  541  270 0 lvmetad
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2710] 0  2710
> > 11892  762  250 -1000 systemd-udevd
> > > > .
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [13774] 0 13774
> >  45977897729 4290 0 Scan Factory
> > > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14506] 0 14506
> > 21628 5340  240 0 macompatsvc
> > > > Jan 23 20:07:17

Re: Exception while running two CQL queries in Parallel

2019-05-01 Thread Stefan Miklosovic

what are your replication factors for that keyspace? why are you using
each quorum?

might be handy 
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigSerialConsistency.html

On Wed, 1 May 2019 at 17:57, Bhavesh Prajapati
 wrote:
>
> I had two queries run on same row in parallel (that’s a use-case). While 
> Batch Query 2 completed successfully, query 1 failed with exception.
>
> Following are driver logs and sequence of log events.
>
>
>
> QUERY 1: STARTED
>
> 2019-04-30T13:14:50.858+ CQL update "EACH_QUORUM" "UPDATE dir SET 
> bid='value' WHERE repoid='06A7490B5CBFA1DE0A494027' IF EXISTS;"
>
>
>
> QUERY 2: STARTED
>
> 2019-04-30T13:14:51.161+ CQL BEGIN BATCH
>
> 2019-04-30T13:14:51.161+ CQL batch-update "06A7490B5CBFA1DE0A494027"
>
> 2019-04-30T13:14:51.161+ CQL batch-delete "06A7490B5CBFA1DE0A494027"
>
> 2019-04-30T13:14:51.161+ CQL APPLY BATCH
>
> 2019-04-30T13:14:51.165+ Cassandra delete directory call completed 
> successfully for "06A7490B5CBFA1DE0A494027"
>
> QUERY 2: COMPLETED - WITH SUCCESS
>
>
>
> QUERY 1: FAILED
>
> 2019-04-30T13:14:52.311+ CQL 
> "org.springframework.cassandra.support.exception.CassandraWriteTimeoutException"
>  "Cassandra timeout during write query at consistency SERIAL (5 replica were 
> required but only 0 acknowledged the write); nested exception is 
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
> during write query at consistency SERIAL (5 replica were required but only 0 
> acknowledged the write)"
>
> org.springframework.cassandra.support.exception.CassandraWriteTimeoutException:
>  Cassandra timeout during write query at consistency SERIAL (5 replica were 
> required but only 0 acknowledged the write); nested exception is 
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
> during write query at consistency SERIAL (5 replica were required but only 0 
> acknowledged the write)
>
> at 
> org.springframework.cassandra.support.CassandraExceptionTranslator.translateExceptionIfPossible(CassandraExceptionTranslator.java:95)
>  ~[spring-cql-1.5.18.RELEASE.jar!/:?]
>
> at 
> org.springframework.cassandra.core.CqlTemplate.potentiallyConvertRuntimeException(CqlTemplate.java:946)
>  ~[spring-cql-1.5.18.RELEASE.jar!/:?]
>
> at 
> org.springframework.cassandra.core.CqlTemplate.translateExceptionIfPossible(CqlTemplate.java:930)
>  ~[spring-cql-1.5.18.RELEASE.jar!/:?]
>
>
>
> What could have caused this exception ?
>
> How to resolve or handle such situation ?
>
>
>
> Thanks,
>
> Bhavesh

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Exception while running two CQL queries in Parallel

2019-05-01 Thread Bhavesh Prajapati

I had two queries run on same row in parallel (that's a use-case). While Batch 
Query 2 completed successfully, query 1 failed with exception.
Following are driver logs and sequence of log events.

QUERY 1: STARTED
2019-04-30T13:14:50.858+ CQL update "EACH_QUORUM" "UPDATE dir SET 
bid='value' WHERE repoid='06A7490B5CBFA1DE0A494027' IF EXISTS;"

QUERY 2: STARTED
2019-04-30T13:14:51.161+ CQL BEGIN BATCH
2019-04-30T13:14:51.161+ CQL batch-update "06A7490B5CBFA1DE0A494027"
2019-04-30T13:14:51.161+ CQL batch-delete "06A7490B5CBFA1DE0A494027"
2019-04-30T13:14:51.161+ CQL APPLY BATCH
2019-04-30T13:14:51.165+ Cassandra delete directory call completed 
successfully for "06A7490B5CBFA1DE0A494027"
QUERY 2: COMPLETED - WITH SUCCESS

QUERY 1: FAILED
2019-04-30T13:14:52.311+ CQL 
"org.springframework.cassandra.support.exception.CassandraWriteTimeoutException"
 "Cassandra timeout during write query at consistency SERIAL (5 replica were 
required but only 0 acknowledged the write); nested exception is 
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
during write query at consistency SERIAL (5 replica were required but only 0 
acknowledged the write)"
org.springframework.cassandra.support.exception.CassandraWriteTimeoutException: 
Cassandra timeout during write query at consistency SERIAL (5 replica were 
required but only 0 acknowledged the write); nested exception is 
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
during write query at consistency SERIAL (5 replica were required but only 0 
acknowledged the write)
at 
org.springframework.cassandra.support.CassandraExceptionTranslator.translateExceptionIfPossible(CassandraExceptionTranslator.java:95)
 ~[spring-cql-1.5.18.RELEASE.jar!/:?]
at 
org.springframework.cassandra.core.CqlTemplate.potentiallyConvertRuntimeException(CqlTemplate.java:946)
 ~[spring-cql-1.5.18.RELEASE.jar!/:?]
at 
org.springframework.cassandra.core.CqlTemplate.translateExceptionIfPossible(CqlTemplate.java:930)
 ~[spring-cql-1.5.18.RELEASE.jar!/:?]

What could have caused this exception ?
How to resolve or handle such situation ?

Thanks,
Bhavesh

Re: cassandra node was put down with oom error

2019-05-01 Thread Ayub M

Do you have search on the same nodes or is it only cassandra. In my case it
was due to a memory leak bug in dse search that consumed more memory
resulting in oom.

On Tue, Apr 30, 2019, 2:58 AM yeomii...@gmail.com 
wrote:

> Hello,
>
> I'm suffering from similar problem with OSS cassandra version3.11.3.
> My cassandra cluster have been running for longer than 1 years and there
> was no problem until this year.
> The cluster is write-intensive, consists of 70 nodes, and all rows have 2
> hr TTL.
> The only change is the read consistency from QUORUM to ONE. (I cannot
> revert this change because of the read latency)
> Below is my compaction strategy.
> ```
> compaction = {'class':
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> 'compaction_window_size': '3', 'compaction_window_unit': 'MINUTES',
> 'enabled': 'true', 'max_threshold': '32', 'min_threshold': '4',
> 'tombstone_compaction_interval': '60', 'tombstone_threshold': '0.2',
> 'unchecked_tombstone_compaction': 'false'}
> ```
> I've tried rolling restarting the cluster several times,
> but the memory usage of cassandra process always keeps going high.
> I also tried Native Memory Tracking, but it only measured less memory
> usage than the system mesaures (RSS in /proc/{cassandra-pid}/status)
>
> Is there any way that I could figure out the cause of this problem?
>
>
> On 2019/01/26 20:53:26, Jeff Jirsa  wrote:
> > You’re running DSE so the OSS list may not be much help. Datastax May
> have more insight
> >
> > In open source, the only things offheap that vary significantly are
> bloom filters and compression offsets - both scale with disk space, and
> both increase during compaction. Large STCS compaction can cause pretty
> meaningful allocations for these. Also, if you have an unusually low
> compression chunk size or a very low bloom filter FP ratio, those will be
> larger.
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Jan 26, 2019, at 12:11 PM, Ayub M  wrote:
> > >
> > > Cassandra node went down due to OOM, and checking the /var/log/message
> I see below.
> > >
> > > ```
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java invoked oom-killer:
> gfp_mask=0x280da, order=0, oom_score_adj=0
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java cpuset=/ mems_allowed=0
> > > 
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA: 1*4kB (U) 0*8kB
> 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U)
> 1*2048kB (M) 3*4096kB (M) = 15908kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA32: 1294*4kB (UM)
> 932*8kB (UEM) 897*16kB (UEM) 483*32kB (UEM) 224*64kB (UEM) 114*128kB (UEM)
> 41*256kB (UEM) 12*512kB (UEM) 7*1024kB (UE
> > > M) 2*2048kB (EM) 35*4096kB (UM) = 242632kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 Normal: 5319*4kB
> (UE) 3233*8kB (UEM) 960*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
> 0*1024kB 0*2048kB 0*4096kB = 62500kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total=0
> hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total=0
> hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 38109 total pagecache pages
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages in swap cache
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Swap cache stats: add 0,
> delete 0, find 0/0
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Free swap  = 0kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Total swap = 0kB
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 16394647 pages RAM
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages HighMem/MovableOnly
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 310559 pages reserved
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ pid ]   uid  tgid
> total_vm  rss nr_ptes swapents oom_score_adj name
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2634] 0  2634
> 41614  326  820 0 systemd-journal
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2690] 0  2690
> 29793  541  270 0 lvmetad
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2710] 0  2710
> 11892  762  250 -1000 systemd-udevd
> > > .
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [13774] 0 13774
>  45977897729 4290 0 Scan Factory
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14506] 0 14506
> 21628 5340  240 0 macompatsvc
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14586] 0 14586
> 21628 5340  240 0 macompatsvc
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14588] 0 14588
> 21628 5340  240 0 macompatsvc
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14589] 0 14589
> 21628 5340  240 0 macompatsvc
> > > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel:

Re: Cassandra taking very long to start and server under heavy load

Re: Bootstrapping to Replace a Dead Node vs. Adding a NewNode:Consistency Guarantees

RE: Bootstrapping to Replace a Dead Node vs. Adding a NewNode:Consistency Guarantees

RE: Bootstrapping to Replace a Dead Node vs. Adding a New Node:Consistency Guarantees

RE: Bootstrapping to Replace a Dead Node vs. Adding a New Node:Consistency Guarantees

Re: Joining a node to the cluster when streaming hosts die

RE: cassandra node was put down with oom error

Re: Bootstrapping to Replace a Dead Node vs. Adding a New Node: Consistency Guarantees

Re: Cassandra taking very long to start and server under heavy load

Re: cassandra node was put down with oom error

Cassandra taking very long to start and server under heavy load

Re: cassandra node was put down with oom error

Re: cassandra node was put down with oom error

Re: cassandra node was put down with oom error

Re: Bootstrapping to Replace a Dead Node vs. Adding a New Node: Consistency Guarantees

Re: cassandra node was put down with oom error

Re: Exception while running two CQL queries in Parallel

Exception while running two CQL queries in Parallel

Re: cassandra node was put down with oom error

19 matches

Site Navigation

Mail list logo

Footer information