Re: Cassandra Config as per server hardware for heavy write

2016-11-22 Thread Benjamin Roth
This is ridiculously slow for that hardware setup. Sounds like you
benchmark with a single thread and / or sync queries or very large writes.
A setup like this should be easily able to handle tens of thousands of
writes / s

2016-11-23 8:02 GMT+01:00 Jonathan Haddad :

> How are you benchmarking that?
> On Tue, Nov 22, 2016 at 9:16 PM Abhishek Kumar Maheshwari <
> abhishek.maheshw...@timesinternet.in> wrote:
>
>> Hi,
>>
>>
>>
>> I have 8 servers in my Cassandra Cluster. Each server has 64 GB ram and
>> 40 Cores and 8 SSD. Currently I have below config in Cassandra.yaml:
>>
>>
>>
>> concurrent_reads: 32
>>
>> concurrent_writes: 64
>>
>> concurrent_counter_writes: 32
>>
>> compaction_throughput_mb_per_sec: 32
>>
>> concurrent_compactors: 8
>>
>>
>>
>> With this configuration, I can write 1700 Request/Sec per server.
>>
>>
>>
>> But our desired write performance is 3000-4000 Request/Sec per server. As
>> per my Understanding Max value for these parameters can be as below:
>>
>> concurrent_reads: 32
>>
>> concurrent_writes: 128(8*16 Corew)
>>
>> concurrent_counter_writes: 32
>>
>> compaction_throughput_mb_per_sec: 128
>>
>> concurrent_compactors: 8 or 16 (as I have 8 SSD and 16 core reserve for
>> this)
>>
>>
>>
>> Please let me know this is fine or I need to tune some other parameters
>> for speedup write.
>>
>>
>>
>>
>>
>> *Thanks & Regards,*
>> *Abhishek Kumar Maheshwari*
>> *+91- 805591 <%2B91-%C2%A0805591> (Mobile)*
>>
>> Times Internet Ltd. | A Times of India Group Company
>>
>> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>>
>> *P** Please do not print this email unless it is absolutely necessary.
>> Spread environmental awareness.*
>>
>>
>> Education gets Exciting with IIM Kozhikode Executive Post Graduate
>> Programme in Management - 2 years (AMBA accredited with full benefits of
>> IIMK Alumni status). Brought to you by IIMK in association with TSW, an
>> Executive Education initiative from The Times of India Group. Learn more:
>> www.timestsw.com
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Cassandra Config as per server hardware for heavy write

2016-11-22 Thread Jonathan Haddad
How are you benchmarking that?
On Tue, Nov 22, 2016 at 9:16 PM Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in> wrote:

> Hi,
>
>
>
> I have 8 servers in my Cassandra Cluster. Each server has 64 GB ram and 40
> Cores and 8 SSD. Currently I have below config in Cassandra.yaml:
>
>
>
> concurrent_reads: 32
>
> concurrent_writes: 64
>
> concurrent_counter_writes: 32
>
> compaction_throughput_mb_per_sec: 32
>
> concurrent_compactors: 8
>
>
>
> With this configuration, I can write 1700 Request/Sec per server.
>
>
>
> But our desired write performance is 3000-4000 Request/Sec per server. As
> per my Understanding Max value for these parameters can be as below:
>
> concurrent_reads: 32
>
> concurrent_writes: 128(8*16 Corew)
>
> concurrent_counter_writes: 32
>
> compaction_throughput_mb_per_sec: 128
>
> concurrent_compactors: 8 or 16 (as I have 8 SSD and 16 core reserve for
> this)
>
>
>
> Please let me know this is fine or I need to tune some other parameters
> for speedup write.
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
> Education gets Exciting with IIM Kozhikode Executive Post Graduate
> Programme in Management - 2 years (AMBA accredited with full benefits of
> IIMK Alumni status). Brought to you by IIMK in association with TSW, an
> Executive Education initiative from The Times of India Group. Learn more:
> www.timestsw.com
>


Cassandra Config as per server hardware for heavy write

2016-11-22 Thread Abhishek Kumar Maheshwari
Hi,

I have 8 servers in my Cassandra Cluster. Each server has 64 GB ram and 40 
Cores and 8 SSD. Currently I have below config in Cassandra.yaml:

concurrent_reads: 32
concurrent_writes: 64
concurrent_counter_writes: 32
compaction_throughput_mb_per_sec: 32
concurrent_compactors: 8

With this configuration, I can write 1700 Request/Sec per server.

But our desired write performance is 3000-4000 Request/Sec per server. As per 
my Understanding Max value for these parameters can be as below:
concurrent_reads: 32
concurrent_writes: 128(8*16 Corew)
concurrent_counter_writes: 32
compaction_throughput_mb_per_sec: 128
concurrent_compactors: 8 or 16 (as I have 8 SSD and 16 core reserve for this)

Please let me know this is fine or I need to tune some other parameters for 
speedup write.


Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

Education gets Exciting with IIM Kozhikode Executive Post Graduate Programme in 
Management - 2 years (AMBA accredited with full benefits of IIMK Alumni 
status). Brought to you by IIMK in association with TSW, an Executive Education 
initiative from The Times of India Group. Learn more: www.timestsw.com


Re: lots of DigestMismatchException in cassandra3

2016-11-22 Thread kurt Greaves
dclocal_read_repair_chance and read_repair_chance are only really relevant
when using a consistency level  wrote:

> Hi Kurt,
>
> Thank you for the suggestion. I ran repair on all the 4 nodes, and after
> the repair, the error “Corrupt empty row found in unfiltered partition”
> disappeared, but the “Mismatch” stopped for a little while and came up
> again.
>
> When we changed both the “dclocal_read_repair_chance” and the
> “read_repair_chance” to 0.0, the “Mismatch” stopped. Is it OK to do that?
> Does it mean when the inconsistence found in reading data, Cassandra
> wouldn’t do the repair and we will just get the inconsistent data? And you
> said the cause is not all replicas receiving all the writes, I think it is
> reasonable but the strange thing is I didn’t notice any failed writing ,
> another cause I can think of is there are insert, update, delete on the
> same record at the same time , is it a possibility?
>
>
>
> --
>
> Regards, Adeline
>
>
>
>
>
>
>
> *From:* kurt Greaves [mailto:k...@instaclustr.com]
> *Sent:* Wednesday, November 23, 2016 6:51 AM
> *To:* Pan, Adeline (TR Technology & Ops)
> *Cc:* user@cassandra.apache.org
> *Subject:* Re: lots of DigestMismatchException in cassandra3
>
>
>
> Yes it could potentially impact performance if there are lots of them. The
> mismatch would occur on a read, the error occurs on a write which is why
> the times wouldn't line up. The cause for the messages as I mentioned is
> when there is a digest mismatch between replicas. The cause is inconsistent
> deta/not all replicas receiving all writes. You should run a repair and see
> if the number of mismatches is reduced.
>
>
> Kurt Greaves
>
> k...@instaclustr.com
>
> www.instaclustr.com
> 
>
>
>
> On 22 November 2016 at 06:30,  wrote:
>
> Hi Kurt,
>
> Thank you for the information, but the error “Corrupt empty row found in
> unfiltered partition” seems not related to the “Mismatch”; the time they
> occurred didn’t match. We use “QUORUM” consistency level for both read and
> write and I didn’t notice any failed writing in the log. Any other cause
> you can think of?  Would it cause performance issue when lots of this
> “Mismatch” happened?
>
>
>
> --
>
> Regards, Adeline
>
>
>
>
>
>
>
> *From:* kurt Greaves [mailto:k...@instaclustr.com]
> *Sent:* Monday, November 21, 2016 5:02 PM
> *To:* user@cassandra.apache.org
> *Cc:* tommy.stend...@ericsson.com
> *Subject:* Re: lots of DigestMismatchException in cassandra3
>
>
>
> Actually, just saw the error message in those logs and what you're looking
> at is probably https://issues.apache.org/jira/browse/CASSANDRA-12694
> 
>
>
> Kurt Greaves
>
> k...@instaclustr.com
>
> www.instaclustr.com
> 
>
>
>
> On 21 November 2016 at 08:59, kurt Greaves  wrote:
>
> That's a debug message. From the sound of it, it's triggered on read where
> there is a digest mismatch between replicas. As to whether it's normal,
> well that depends on your cluster. Are the nodes reporting lots of dropped
> mutations and are you writing at 
>
>
>
>


RE: lots of DigestMismatchException in cassandra3

2016-11-22 Thread Adeline.Pan
Hi Kurt,
Thank you for the suggestion. I ran repair on all the 4 nodes, and after the 
repair, the error “Corrupt empty row found in unfiltered partition” 
disappeared, but the “Mismatch” stopped for a little while and came up again.
When we changed both the “dclocal_read_repair_chance” and the 
“read_repair_chance” to 0.0, the “Mismatch” stopped. Is it OK to do that? Does 
it mean when the inconsistence found in reading data, Cassandra wouldn’t do the 
repair and we will just get the inconsistent data? And you said the cause is 
not all replicas receiving all the writes, I think it is reasonable but the 
strange thing is I didn’t notice any failed writing , another cause I can think 
of is there are insert, update, delete on the same record at the same time , is 
it a possibility?

--
Regards, Adeline



From: kurt Greaves [mailto:k...@instaclustr.com]
Sent: Wednesday, November 23, 2016 6:51 AM
To: Pan, Adeline (TR Technology & Ops)
Cc: user@cassandra.apache.org
Subject: Re: lots of DigestMismatchException in cassandra3

Yes it could potentially impact performance if there are lots of them. The 
mismatch would occur on a read, the error occurs on a write which is why the 
times wouldn't line up. The cause for the messages as I mentioned is when there 
is a digest mismatch between replicas. The cause is inconsistent deta/not all 
replicas receiving all writes. You should run a repair and see if the number of 
mismatches is reduced.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 22 November 2016 at 06:30, 
> wrote:
Hi Kurt,
Thank you for the information, but the error “Corrupt empty row found in 
unfiltered partition” seems not related to the “Mismatch”; the time they 
occurred didn’t match. We use “QUORUM” consistency level for both read and 
write and I didn’t notice any failed writing in the log. Any other cause you 
can think of?  Would it cause performance issue when lots of this “Mismatch” 
happened?

--
Regards, Adeline



From: kurt Greaves [mailto:k...@instaclustr.com]
Sent: Monday, November 21, 2016 5:02 PM
To: user@cassandra.apache.org
Cc: tommy.stend...@ericsson.com
Subject: Re: lots of DigestMismatchException in cassandra3

Actually, just saw the error message in those logs and what you're looking at 
is probably 
https://issues.apache.org/jira/browse/CASSANDRA-12694

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 21 November 2016 at 08:59, kurt Greaves 
> wrote:

That's a debug message. From the sound of it, it's triggered on read where 
there is a digest mismatch between replicas. As to whether it's normal, well 
that depends on your cluster. Are the nodes reporting lots of dropped mutations 
and are you writing at 

Re: lots of DigestMismatchException in cassandra3

2016-11-22 Thread kurt Greaves
Yes it could potentially impact performance if there are lots of them. The
mismatch would occur on a read, the error occurs on a write which is why
the times wouldn't line up. The cause for the messages as I mentioned is
when there is a digest mismatch between replicas. The cause is inconsistent
deta/not all replicas receiving all writes. You should run a repair and see
if the number of mismatches is reduced.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 22 November 2016 at 06:30,  wrote:

> Hi Kurt,
>
> Thank you for the information, but the error “Corrupt empty row found in
> unfiltered partition” seems not related to the “Mismatch”; the time they
> occurred didn’t match. We use “QUORUM” consistency level for both read and
> write and I didn’t notice any failed writing in the log. Any other cause
> you can think of?  Would it cause performance issue when lots of this
> “Mismatch” happened?
>
>
>
> --
>
> Regards, Adeline
>
>
>
>
>
>
>
> *From:* kurt Greaves [mailto:k...@instaclustr.com]
> *Sent:* Monday, November 21, 2016 5:02 PM
> *To:* user@cassandra.apache.org
> *Cc:* tommy.stend...@ericsson.com
> *Subject:* Re: lots of DigestMismatchException in cassandra3
>
>
>
> Actually, just saw the error message in those logs and what you're looking
> at is probably https://issues.apache.org/jira/browse/CASSANDRA-12694
> 
>
>
> Kurt Greaves
>
> k...@instaclustr.com
>
> www.instaclustr.com
> 
>
>
>
> On 21 November 2016 at 08:59, kurt Greaves  wrote:
>
> That's a debug message. From the sound of it, it's triggered on read where
> there is a digest mismatch between replicas. As to whether it's normal,
> well that depends on your cluster. Are the nodes reporting lots of dropped
> mutations and are you writing at 
>
>


Cluster nodes not catching up on total hints

2016-11-22 Thread Andrew Kenney
 We’re seeing a strange issue on our Cassandra cluster wherein 3 nodes out
of 21 appear to have a significant amount of hints piling up.  We’re not
seeing a lot in the system log showing that the node is having issues with
hints and nodetool status is not showing any issues with the other nodes in
the cluster.

In attempting to help the node catch up with hints we’ve tried to increase
the hinted handoff KB throttle (we ran "nodetool sethintedhandoffthrottlekb
20480" on the 3 nodes getting backed up) but that does not appear to have
made a difference in the hints processing.

We’re looking for guidance on how we can debug the cluster to determine why
the node may be falling behind on hints and how to resolve the situation.

We’re currently looking at at the JMX Storage.TotalHints.count metric as
well as the hints directory itself.

[image: Inline image 1]

sudo du -hs /mnt/cassandra/data/hints
39G/mnt/cassandra/data/hints

Nodetool tpstats is showing 1 active HintsDispatcher.
nodetool tpstats | grep Hints
Pool NameActive   Pending  Completed   Blocked  All
time blocked
HintsDispatcher   1 6 66 0
   0


Re: single instance failover

2016-11-22 Thread Vladimir Yudovin
Sorry, probably I didn't catch your setup fully.



Would you like to use shared data folder for both nodes, assuming you never run 
two Cassandra process simultaneously?

Well, I guess it's possible. Running two Cassandra instances on the same data 
folder together won't work, so prevent this situation, may be with some sort of 
file locking.



multinode Cassandra for Node B is not free

Sure, but besides higher reliability you also get increase in read queries 
speed (with consistency ONE).



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting, Zero production time






 On Tue, 22 Nov 2016 14:28:33 -0500Lou DeGenaro 
lou.degen...@gmail.com wrote 




Yes, change rpc_address to node B.


Immutability aside, if Node A Cassandra and Node B Cassandra are using the same 
directory on the same shared filesystem, let's call it 
/cassandra/state/database, would that not be a problem?  Or said differently, 
does not Node A need its own writable place /cassandra/state/database/nodeA and 
likewise /cassandra/state/database/nodeB for Node B's writable place?



Multinode Cassandra may not always be available due to resource constraints.  
Presumably multinode Cassandra for Node B is not free: it takes up network, 
cpu, and replicated disk space, no?


Lou.



On 2016-11-22 11:10 (-0500), Vladimir Yudovin v...@winguzone.com wrote: 

 Hi Lou, 

 

 

 

 do you mean you set rpc_address (or broadcast_rpc_address) to Node_B_IP on 
second machine? 

 

 

 

 gt;there would be potential database corruption, no? 

 

 Well, so SSTables are immutable, it can lead to unpredictable behavior, I 
guess. I don't believe anybody tested such setup before. 

 

 

 

 gt;Is there any guidance on single instance failover? 

 

 I never saw one, the main Casandra idea that you build multinode 
cluster. 

%











Re: single instance failover

2016-11-22 Thread Lou DeGenaro
Yes, change rpc_address to node B.

Immutability aside, if Node A Cassandra and Node B Cassandra are using the
same directory on the same shared filesystem, let's call it
/cassandra/state/database,
would that not be a problem?  Or said differently, does not Node A need its
own writable place /cassandra/state/database/nodeA and likewise /cassandra
/state/database/nodeB for Node B's writable place?

Multinode Cassandra may not always be available due to resource
constraints.  Presumably multinode Cassandra for Node B is not free: it
takes up network, cpu, and replicated disk space, no?

Lou.

On 2016-11-22 11:10 (-0500), Vladimir Yudovin  wrote:
> Hi Lou,>
>
>
>
> do you mean you set  rpc_address (or broadcast_rpc_address) to Node_B_IP
on second machine?>
>
>
>
> there would be potential database corruption, no?>
>
> Well, so SSTables are immutable, it can lead to unpredictable behavior, I
guess. I don't believe anybody tested such setup before.>
>
>
>
> Is there any guidance on single instance failover?>
>
> I never saw one, the main Casandra idea that you build multinode
cluster.>
>%


Re: Cassandra Encryption

2016-11-22 Thread Jai Bheemsen Rao Dhanwada
Thanks Nate and Vladimir,

I will give it a try.

On Tue, Nov 22, 2016 at 12:48 AM, Vladimir Yudovin 
wrote:

> >if I use the same certificate how does it helps?
> This certificate will be recognized by all existing nodes, and no restart
> will be needed.
>
> Or, as Nate suggested, you can use trusted root certificate to issue
> nodes' certificates.
>
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Tue, 22 Nov 2016 03:07:28 -0500*Jai Bheemsen Rao Dhanwada
> >* wrote 
>
> yes, I am generating separate certificate for each node.
> even if I use the same certificate how does it helps?
>
> On Mon, Nov 21, 2016 at 9:02 PM, Vladimir Yudovin 
> wrote:
>
>
> Hi Jai,
>
> so do you generate separate certificate for each node? Why not use one
> certificate for all nodes?
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Mon, 21 Nov 2016 17:25:11 -0500*Jai Bheemsen Rao Dhanwada
> >* wrote 
>
> Hello,
>
> I am setting up encryption on one of my cassandra cluster using the below
> procedure.
>
> server_encryption_options:
> internode_encryption: all
> keystore: /etc/keystore
> keystore_password: x
> truststore: /etc/truststore
> truststore_password: x
>
> http://docs.oracle.com/javase/6/docs/technotes/guides/
> security/jsse/JSSERefGuide.html#CreateKeystore
>
> However, one difficulty with this approach is whenever I am adding a new
> node I had to rolling restart all the C* nodes in the cluster, so that the
> truststore is updated with the new server information.
>
> Is there a way to automatically trigger a reload so that the truststore is
> updated on the existing machines without restart.
>
> Can someone please help ?
>
>
>
>


Re: data not replicated on new node

2016-11-22 Thread Bertrand Brelier
Hello Shalom.

No I really went from 3.1.1 to 3.0.9 .

Cheers.

Bertrand

On Nov 22, 2016 1:57 AM, "Shalom Sagges"  wrote:

>
> *I took that opportunity to upgrade from 3.1.1 to 3.0.9*
>
> If my guess is right and you meant that you upgraded from 2.1.1 to 3.0.9
> directly, then this might cause some issues (not necessarily the issue at
> hand though). The proper upgrade process should be to 2.1.9 and from there
> upgrade to 3.0.x.
>
> https://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/
> upgrdCassandra.html
>
> Hope this helps.
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
>  
>  We Create Meaningful Connections
>
> 
>
>
> On Tue, Nov 22, 2016 at 2:44 AM, Bertrand Brelier <
> bertrand.brel...@gmail.com> wrote:
>
>> Hello Shalom, Vladimir,
>>
>> Thanks for your help.
>>
>> I had initially 3 nodes, had a hardware failure and reinstalled Cassandra
>> on the node (I took that opportunity to upgrade from 3.1.1 to 3.0.9). I ran
>> nodetool upgradesstables and nodetool repair on each node once I updated
>> Cassandra.
>>
>> The 3 nodes are in the same private network, I am using the private IPs
>> for the seeds and the listen_address and the public IPs for rpc_address
>>
>> I am using ssl to encrypt the communication between the nodes, so I am
>> using the port 7001 :
>>
>> telnet PRIVATEIP 7001
>> Trying PRIVATEIP...
>> Connected to PRIVATEIP.
>>
>> Each node can connect with any other node.
>>
>> I selected some old data from the new node :
>>
>> CONSISTENCY;
>> Current consistency level is ONE.
>> select count(*) from ;
>>
>>  count
>> ---
>>  0
>>
>> CONSISTENCY ALL;
>> Consistency level set to ALL.
>> count(*) from ;
>>
>>  count
>> ---
>> 64
>>
>> When I switched to ALL I could get the data while the initial level ONE
>> did not have any data. I did not expect to get any data with ALL, am I
>> missing something ?
>>
>> I do not know if this is related, but while I was inquiring the database,
>> I had the following messages in the debug.log :
>>
>> DEBUG [ReadRepairStage:15292] 2016-11-21 18:15:59,719
>> ReadCallback.java:234 - Digest mismatch:
>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>> DecoratedKey(2288259866140251828, 0004002a04421500)
>> (d41d8cd98f00b204e9800998ecf8427e vs ce211ac5533e1a146d9fee734fd8de26)
>> at 
>> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85)
>> ~[apache-cassandra-3.0.10.jar:3.0.10]
>> at 
>> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
>> ~[apache-cassandra-3.0.10.jar:3.0.10]
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> [na:1.8.0_111]
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_111]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
>>
>>
>> Thanks for your help,
>>
>> Cheers,
>>
>> Bertrand
>>
>>
>> On 16-11-21 01:28 AM, Shalom Sagges wrote:
>>
>> I believe the logs should show you what the issue is.
>> Also, can the node "talk" with the others? (i.e. telnet to the other
>> nodes on port 7000).
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>>  
>>  We Create Meaningful Connections
>>
>> 
>>
>>
>> On Sun, Nov 20, 2016 at 8:50 PM, Bertrand Brelier <
>> bertrand.brel...@gmail.com> wrote:
>>
>>> Hello Jonathan,
>>>
>>> No, the new node is not a seed in my cluster.
>>>
>>> When I ran nodetool bootstrap resume
>>> Node is already bootstrapped.
>>>
>>> Cheers,
>>>
>>> Bertrand
>>>
>>> On Sun, Nov 20, 2016 at 1:43 PM, Jonathan Haddad 
>>> wrote:
>>>
 Did you add the new node as a seed? If you did, it wouldn't bootstrap,
 and you should run repair.
 On Sun, Nov 20, 2016 at 10:36 AM Bertrand Brelier <
 bertrand.brel...@gmail.com> wrote:

> Hello everybody,
>
> I am using a 3-node Cassandra cluster with Cassandra 3.0.10.
>
> I recently added a new node (to make it a 3-node cluster).
>
> I am using a replication factor of 3 , so I expected to have a copy of
> the same data on each node :
>
> CREATE KEYSPACE mydata WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': '3'}  AND durable_writes = true;
>
> But the new node has  less data that the other 2 :
>
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns 

RE: cassandra documentation (Multiple datacenter write requests) question

2016-11-22 Thread CHAUMIER , RAPHAËL
Ok,

I submitted to datastax my question.

Regards,
Raphaël CHAUMIER

De : Vladimir Yudovin [mailto:vla...@winguzone.com]
Envoyé : mardi 22 novembre 2016 16:59
À : user 
Objet : RE: cassandra documentation (Multiple datacenter write requests) 
question

Is Apache Cassandra community can update this documentation ?
I don't think so, it's hosted on DataStax website and it's not public Wiki.

Anyway you know what is right quorum calculation  formula is ))).

Best regards, Vladimir Yudovin,
Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.


 On Tue, 22 Nov 2016 09:01:32 -0500CHAUMIER, RAPHAËL 
> 
> wrote 

Thank you Hannu,

Is Apache Cassandra community can update this documentation ?

De : Hannu Kröger [mailto:hkro...@gmail.com]
Envoyé : mardi 22 novembre 2016 14:48
À : user@cassandra.apache.org
Objet : Re: cassandra documentation (Multiple datacenter write requests) 
question

Looks like the graph is wrong.

Hannu

On 22 Nov 2016, at 15.43, CHAUMIER, RAPHAËL 
> wrote:

Hello everyone,

I don’t know if you have access to DataStax documentation. I don’t understand 
the example about Multiple datacenter write requests 
(http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlClientRequestsMultiDCWrites.html).
 The graph shows there’s 3 nodes making up of QUORUM, but based on the quorum 
computation rule 
(http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html#dmlConfigConsistency__about-the-quorum-level)

quorum = (sum_of_replication_factors / 2) + 1

sum_of_replication_factors = datacenter1_RF + datacenter2_RF + . . . + 
datacentern_RF

If I have 2 DC of 3 replica nodes so the quorum should be = ( 3+3 /2) +1 = 
(6/2) + 1 = 3 + 1 = 4

Am I missing something ?

Thanks for your response.

Regards,





L'intégrité de ce message n'étant pas assurée sur internet, la société 
expéditrice ne peut être tenue responsable de son contenu ni de ses pièces 
jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous 
n'êtes pas destinataire de ce message, merci de le détruire et d'avertir 
l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company 
that sent this message cannot therefore be held liable for its content nor 
attachments. Any unauthorized use or dissemination is prohibited. If you are 
not the intended recipient of this message, then please delete it and notify 
the sender.




Re: single instance failover

2016-11-22 Thread Vladimir Yudovin
Hi Lou,



do you mean you set  rpc_address (or broadcast_rpc_address) to Node_B_IP on 
second machine?



there would be potential database corruption, no?

Well, so SSTables are immutable, it can lead to unpredictable behavior, I 
guess. I don't believe anybody tested such setup before.



Is there any guidance on single instance failover?

I never saw one, the main Casandra idea that you build multinode cluster.



Any specific reason why can't you use two nodes as single cluster? 



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting, zero production time.






 On Tue, 22 Nov 2016 09:25:52 -0500Lou DeGenaro 
lou.degen...@gmail.com wrote 




We use a single instance of Cassandra on Node A that employs a shared file 
system to keep its data and logs.


Let's say we want to fail-over to Node B, by editing the yaml file by changing 
Node A to Node B.  If we now (mistakenly) bring up Cassandra on Node B whilst 
the Cassandra on Node A is still running, there would be potential database 
corruption, no?


Is there any guidance on single instance failover?


Thanks.


Lou.









RE: cassandra documentation (Multiple datacenter write requests) question

2016-11-22 Thread Vladimir Yudovin
Is Apache Cassandra community can update this documentation ?

I don't think so, it's hosted on DataStax website and it's not public Wiki.



Anyway you know what is right quorum calculation  formula is ))).



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Tue, 22 Nov 2016 09:01:32 -0500CHAUMIER, RAPHAËL 
racha...@bouyguestelecom.fr racha...@bouyguestelecom.fr wrote 





Thank you Hannu,

 

Is Apache Cassandra community can update this documentation ?

 

De : Hannu Kröger [mailto:hkro...@gmail.com] 
 Envoyé : mardi 22 novembre 2016 14:48
 À : user@cassandra.apache.org
 Objet : Re: cassandra documentation (Multiple datacenter write requests) 
question


 

Looks like the graph is wrong.

 


Hannu


 

On 22 Nov 2016, at 15.43, CHAUMIER, RAPHAËL racha...@bouyguestelecom.fr 
wrote:


 

Hello everyone,


 


I don’t know if you have access to DataStax documentation. I don’t understand 
the example about Multiple datacenter write requests 
(http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlClientRequestsMultiDCWrites.html).
 The graph shows there’s 3 nodes making up of QUORUM, but based on the quorum 
computation rule 
(http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html#dmlConfigConsistency__about-the-quorum-level)


 


quorum = (sum_of_replication_factors / 2) + 1

sum_of_replication_factors = datacenter1_RF + datacenter2_RF + . . . + 
datacentern_RF
 


If I have 2 DC of 3 replica nodes so the quorum should be = ( 3+3 /2) +1 = 
(6/2) + 1 = 3 + 1 = 4


 


Am I missing something ?


 


Thanks for your response.


 


Regards,


 


 




L'intégrité de ce message n'étant pas assurée sur internet, la société 
expéditrice ne peut être tenue responsable de son contenu ni de ses pièces 
jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous 
n'êtes pas destinataire de ce message, merci de le détruire et d'avertir 
l'expéditeur.
 
 The integrity of this message cannot be guaranteed on the Internet. The 
company that sent this message cannot therefore be held liable for its content 
nor attachments. Any unauthorized use or dissemination is prohibited. If you 
are not the intended recipient of this message, then please delete it and 
notify the sender.


 










single instance failover

2016-11-22 Thread Lou DeGenaro
We use a single instance of Cassandra on Node A that employs a shared file
system to keep its data and logs.

Let's say we want to fail-over to Node B, by editing the yaml file by
changing Node A to Node B.  If we now (mistakenly) bring up Cassandra on
Node B whilst the Cassandra on Node A is still running, there would be
potential database corruption, no?

Is there any guidance on single instance failover?

Thanks.

Lou.


RE: cassandra documentation (Multiple datacenter write requests) question

2016-11-22 Thread CHAUMIER , RAPHAËL
Thank you Hannu,

Is Apache Cassandra community can update this documentation ?

De : Hannu Kröger [mailto:hkro...@gmail.com]
Envoyé : mardi 22 novembre 2016 14:48
À : user@cassandra.apache.org
Objet : Re: cassandra documentation (Multiple datacenter write requests) 
question

Looks like the graph is wrong.

Hannu

On 22 Nov 2016, at 15.43, CHAUMIER, RAPHAËL 
> wrote:

Hello everyone,

I don’t know if you have access to DataStax documentation. I don’t understand 
the example about Multiple datacenter write requests 
(http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlClientRequestsMultiDCWrites.html).
 The graph shows there’s 3 nodes making up of QUORUM, but based on the quorum 
computation rule 
(http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html#dmlConfigConsistency__about-the-quorum-level)

quorum = (sum_of_replication_factors / 2) + 1

sum_of_replication_factors = datacenter1_RF + datacenter2_RF + . . . + 
datacentern_RF

If I have 2 DC of 3 replica nodes so the quorum should be = ( 3+3 /2) +1 = 
(6/2) + 1 = 3 + 1 = 4

Am I missing something ?

Thanks for your response.

Regards,




L'intégrité de ce message n'étant pas assurée sur internet, la société 
expéditrice ne peut être tenue responsable de son contenu ni de ses pièces 
jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous 
n'êtes pas destinataire de ce message, merci de le détruire et d'avertir 
l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company 
that sent this message cannot therefore be held liable for its content nor 
attachments. Any unauthorized use or dissemination is prohibited. If you are 
not the intended recipient of this message, then please delete it and notify 
the sender.



Re: cassandra documentation (Multiple datacenter write requests) question

2016-11-22 Thread Hannu Kröger
Looks like the graph is wrong.

Hannu

> On 22 Nov 2016, at 15.43, CHAUMIER, RAPHAËL  
> wrote:
> 
> Hello everyone,
>  
> I don’t know if you have access to DataStax documentation. I don’t understand 
> the example about Multiple datacenter write requests 
> (http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlClientRequestsMultiDCWrites.html
>  
> ).
>  The graph shows there’s 3 nodes making up of QUORUM, but based on the quorum 
> computation rule 
> (http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html#dmlConfigConsistency__about-the-quorum-level
>  
> )
>  
> quorum = (sum_of_replication_factors / 2) + 1
> 
> sum_of_replication_factors = datacenter1_RF + datacenter2_RF + . . . + 
> datacentern_RF
>  
> If I have 2 DC of 3 replica nodes so the quorum should be = ( 3+3 /2) +1 = 
> (6/2) + 1 = 3 + 1 = 4
>  
> Am I missing something ?
>  
> Thanks for your response.
>  
> Regards,
>  
> 
> 
> L'intégrité de ce message n'étant pas assurée sur internet, la société 
> expéditrice ne peut être tenue responsable de son contenu ni de ses pièces 
> jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous 
> n'êtes pas destinataire de ce message, merci de le détruire et d'avertir 
> l'expéditeur.
> 
> The integrity of this message cannot be guaranteed on the Internet. The 
> company that sent this message cannot therefore be held liable for its 
> content nor attachments. Any unauthorized use or dissemination is prohibited. 
> If you are not the intended recipient of this message, then please delete it 
> and notify the sender.



smime.p7s
Description: S/MIME cryptographic signature


cassandra documentation (Multiple datacenter write requests) question

2016-11-22 Thread CHAUMIER , RAPHAËL
Hello everyone,

I don't know if you have access to DataStax documentation. I don't understand 
the example about Multiple datacenter write requests 
(http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlClientRequestsMultiDCWrites.html).
 The graph shows there's 3 nodes making up of QUORUM, but based on the quorum 
computation rule 
(http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html#dmlConfigConsistency__about-the-quorum-level)

quorum = (sum_of_replication_factors / 2) + 1

sum_of_replication_factors = datacenter1_RF + datacenter2_RF + . . . + 
datacentern_RF

If I have 2 DC of 3 replica nodes so the quorum should be = ( 3+3 /2) +1 = 
(6/2) + 1 = 3 + 1 = 4

Am I missing something ?

Thanks for your response.

Regards,




L'intégrité de ce message n'étant pas assurée sur internet, la société 
expéditrice ne peut être tenue responsable de son contenu ni de ses pièces 
jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous 
n'êtes pas destinataire de ce message, merci de le détruire et d'avertir 
l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company 
that sent this message cannot therefore be held liable for its content nor 
attachments. Any unauthorized use or dissemination is prohibited. If you are 
not the intended recipient of this message, then please delete it and notify 
the sender.


Re: Is it *safe* to issue multiple replace-node at the same time?

2016-11-22 Thread Paulo Motta
It's safe but since the replacement node will stream data from a single
replica per local range, it will potentially propagate any inconsistencies
from the replica it streams from, so it's recommended to run repair after a
replace to reduce entropy specially when replacing a node with the same IP
due to CASSANDRA-12344.

2016-11-21 20:34 GMT-02:00 kurt Greaves :

>
> On 21 November 2016 at 18:58, Ben Bromhead  wrote:
>
>> Same rack and no range movements, my first instinct is to say yes it is
>> safe (I like to treat racks as one giant meta node). However I would want
>> to have a read through the replace code.
>
>
> This is assuming RF<=# of racks as well (and NTS).
>
> Kurt Greaves
> www.instaclustr.com
>


Re: Out of memory and/or OOM kill on a cluster

2016-11-22 Thread Vincent Rischmann
Thanks for the detailed answer Alexander.



We'll look into your suggestions, it's definitely helpful. We have plans
to reduce tombstones and remove the table with the big partitions,
hopefully after we've done that the cluster will be stable again.


Thanks again.





On Tue, Nov 22, 2016, at 09:03 AM, Alexander Dejanovski wrote:

> Hi Vincent, 

> 

> Here are a few pointers for disabling swap : 

> - 
> https://docs.datastax.com/en/cassandra/2.0/cassandra/install/installRecommendSettings.html
> - 
> http://stackoverflow.com/questions/22988824/why-swap-needs-to-be-turned-off-in-datastax-cassandra
> 

> Tombstones are definitely the kind of object that can clutter your
> heap, lead to frequent GC pauses and could be part of why you run into
> OOM from time to time. I cannot answer for sure though as it is a bit
> more complex than that actually.
> You do not have crazy high GC pauses, although a 5s pause should not
> happen on a healthy cluster.
> 

> Getting back to big partitions, I've had the case in production where
> a multi GB partition was filling a 26GB G1 heap when being compacted.
> Eventually, the old gen took all the available space in the heap,
> leaving no room for the young gen, but it actually never OOMed. To be
> honest, I would have preferred an OOM to the inefficient 50s GC pauses
> we've had because such a slow node can (and did) affect the whole
> cluster.
> 

> I think you may have a combination of things happening here and you
> should work on improving them all :
> - spot precisely which are your big partitions to understand why you
>   have some (data modeling issue or data source bad behavior) : look
>   for "large partition" warnings in the cassandra logs, it will give
>   you the partition key
> - try to reduce the number of tombstones you're reading by changing
>   your queries or data model, or maybe by setting up an aggressive
>   tombstone pruning strategy :
>   
> http://cassandra.apache.org/doc/latest/operating/compaction.html?highlight=unchecked_tombstone_compaction#common-options
> You could benefit from setting unchecked_tombstone_compaction to true
> and tuning both tombstone_threshold and tombstone_compaction_interval
> - Follow recommended production settings and fully disable swap from
>   your Cassandra nodes
> 

> You might want to scale down from the 20GB heap as the OOM Killer will
> stop your process either way, and it might allow you to have an
> analyzable heap dump. Such a heap dump could tell us if there are lots
> of tombstones there when the JVM dies.
> 

> I hope that's helpful as there is no easy answer here, and the problem
> should be narrowed down by fixing all potential causes.
> 

> Cheers,

> 

> 

> 

> 

> On Mon, Nov 21, 2016 at 5:10 PM Vincent Rischmann
>  wrote:
>> __

>> Thanks for your answer Alexander.

>> 

>> We're writing constantly to the table, we estimate it's something
>> like 1.5k to 2k writes per second. Some of these requests update a
>> bunch of fields, some update fields + append something to a set.
>> We don't read constantly from it but when we do it's a lot of read,
>> up to 20k reads per second sometimes.
>> For this particular keyspace everything is using the size tiered
>> compaction strategy.
>> 

>>  - Every node is a physical server, has a 8-Core CPU, 32GB of ram and
>>3TB of SSD.
>>  - Java version is 1.8.0_101 for all nodes except one which is using
>>1.8.0_111 (only for about a week I think, before that it used
>>1.8.0_101 too).
>>  - We're using the G1 GC. I looked at the 19th and on that day we
>>had:
>>   - 1505 GCs

>>   - 2 Old Gen GCs which took around 5s each

>>   - the rest are New Gen GCs, with only 1 other 1s. There's 15 to 20
>> GCs which took between 0.4 and 0.7s. The rest is between 250ms
>> and 400ms approximately.
>> Sometimes, there are 3/4/5 GCs in a row in like 2 seconds, each
>> taking between 250ms to 400ms, but it's kinda rare from what I
>> can see.
>>  - So regarding GC logs, I have them enabled, I've got a bunch of
>>gc.log.X files in /var/log/cassandra, but somehow I can't find any
>>log files for certain periods. On one node which crashed this
>>morning I lost like a week of GC logs, no idea what is happening
>>there...
>>  - I'll just put a couple of warnings here, there are around 9k just
>>for today.
>> 

>> WARN  [SharedPool-Worker-8] 2016-11-21 17:02:00,497
>> SliceQueryFilter.java:320 - Read 2001 live and 11129 tombstone cells
>> in foo.install_info for key: foo@IOS:7 (see
>> tombstone_warn_threshold). 2000 columns were requested, slices=[-]
>> WARN  [SharedPool-Worker-1] 2016-11-21 17:02:02,559
>> SliceQueryFilter.java:320 - Read 2001 live and 11064 tombstone cells
>> in foo.install_info for key: foo@IOS:7 (see
>> tombstone_warn_threshold). 2000 columns were requested, 
>> slices=[di[42FB29E1-8C99-45BE-8A44-9480C50C6BC4]:!-
>> ]
>> WARN  [SharedPool-Worker-2] 2016-11-21 17:02:05,286
>> SliceQueryFilter.java:320 - 

Re: Cassandra Encryption

2016-11-22 Thread Vladimir Yudovin
if I use the same certificate how does it helps?

This certificate will be recognized by all existing nodes, and no restart will 
be needed.



Or, as Nate suggested, you can use trusted root certificate to issue nodes' 
certificates.





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Tue, 22 Nov 2016 03:07:28 -0500Jai Bheemsen Rao Dhanwada 
jaibheem...@gmail.com wrote 




yes, I am generating separate certificate for each node.

even if I use the same certificate how does it helps?




On Mon, Nov 21, 2016 at 9:02 PM, Vladimir Yudovin vla...@winguzone.com 
wrote:








Hi Jai,



so do you generate separate certificate for each node? Why not use one 
certificate for all nodes?



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Mon, 21 Nov 2016 17:25:11 -0500Jai Bheemsen Rao Dhanwada 
jaibheem...@gmail.com wrote 




Hello,



I am setting up encryption on one of my cassandra cluster using the below 
procedure.



server_encryption_options:

internode_encryption: all

keystore: /etc/keystore

keystore_password: x

truststore: /etc/truststore

truststore_password: x




http://docs.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html#CreateKeystore



However, one difficulty with this approach is whenever I am adding a new node I 
had to rolling restart all the C* nodes in the cluster, so that the truststore 
is updated with the new server information.



Is there a way to automatically trigger a reload so that the truststore is 
updated on the existing machines without restart.



Can someone please help ?
















Re: Cassandra Encryption

2016-11-22 Thread Nate McCall
You should be using a root certificate for signing all the node
certificates to create a trust chain. That way nodes won't have to
explicitly know about each other, only the root certificate.

This post has some details:
http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html

On Tue, Nov 22, 2016 at 9:07 PM, Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> yes, I am generating separate certificate for each node.
> even if I use the same certificate how does it helps?
>
> On Mon, Nov 21, 2016 at 9:02 PM, Vladimir Yudovin 
> wrote:
>
>> Hi Jai,
>>
>> so do you generate separate certificate for each node? Why not use one
>> certificate for all nodes?
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone  - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>>  On Mon, 21 Nov 2016 17:25:11 -0500*Jai Bheemsen Rao Dhanwada
>> >* wrote 
>>
>> Hello,
>>
>> I am setting up encryption on one of my cassandra cluster using the below
>> procedure.
>>
>> server_encryption_options:
>> internode_encryption: all
>> keystore: /etc/keystore
>> keystore_password: x
>> truststore: /etc/truststore
>> truststore_password: x
>>
>> http://docs.oracle.com/javase/6/docs/technotes/guides/securi
>> ty/jsse/JSSERefGuide.html#CreateKeystore
>>
>> However, one difficulty with this approach is whenever I am adding a new
>> node I had to rolling restart all the C* nodes in the cluster, so that the
>> truststore is updated with the new server information.
>>
>> Is there a way to automatically trigger a reload so that the truststore
>> is updated on the existing machines without restart.
>>
>> Can someone please help ?
>>
>>
>>
>


-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Cassandra Encryption

2016-11-22 Thread Jai Bheemsen Rao Dhanwada
yes, I am generating separate certificate for each node.
even if I use the same certificate how does it helps?

On Mon, Nov 21, 2016 at 9:02 PM, Vladimir Yudovin 
wrote:

> Hi Jai,
>
> so do you generate separate certificate for each node? Why not use one
> certificate for all nodes?
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Mon, 21 Nov 2016 17:25:11 -0500*Jai Bheemsen Rao Dhanwada
> >* wrote 
>
> Hello,
>
> I am setting up encryption on one of my cassandra cluster using the below
> procedure.
>
> server_encryption_options:
> internode_encryption: all
> keystore: /etc/keystore
> keystore_password: x
> truststore: /etc/truststore
> truststore_password: x
>
> http://docs.oracle.com/javase/6/docs/technotes/guides/
> security/jsse/JSSERefGuide.html#CreateKeystore
>
> However, one difficulty with this approach is whenever I am adding a new
> node I had to rolling restart all the C* nodes in the cluster, so that the
> truststore is updated with the new server information.
>
> Is there a way to automatically trigger a reload so that the truststore is
> updated on the existing machines without restart.
>
> Can someone please help ?
>
>
>


Re: Out of memory and/or OOM kill on a cluster

2016-11-22 Thread Alexander Dejanovski
Hi Vincent,

Here are a few pointers for disabling swap :
-
https://docs.datastax.com/en/cassandra/2.0/cassandra/install/installRecommendSettings.html
-
http://stackoverflow.com/questions/22988824/why-swap-needs-to-be-turned-off-in-datastax-cassandra

Tombstones are definitely the kind of object that can clutter your heap,
lead to frequent GC pauses and could be part of why you run into OOM from
time to time. I cannot answer for sure though as it is a bit more complex
than that actually.
You do not have crazy high GC pauses, although a 5s pause should not happen
on a healthy cluster.

Getting back to big partitions, I've had the case in production where a
multi GB partition was filling a 26GB G1 heap when being compacted.
Eventually, the old gen took all the available space in the heap, leaving
no room for the young gen, but it actually never OOMed. To be honest, I
would have preferred an OOM to the inefficient 50s GC pauses we've had
because such a slow node can (and did) affect the whole cluster.

I think you may have a combination of things happening here and you should
work on improving them all :
- spot precisely which are your big partitions to understand why you have
some (data modeling issue or data source bad behavior) : look for "large
partition" warnings in the cassandra logs, it will give you the partition
key
- try to reduce the number of tombstones you're reading by changing your
queries or data model, or maybe by setting up an aggressive tombstone
pruning strategy :
http://cassandra.apache.org/doc/latest/operating/compaction.html?highlight=unchecked_tombstone_compaction#common-options
You could benefit from setting unchecked_tombstone_compaction to true and
tuning both tombstone_threshold and tombstone_compaction_interval
- Follow recommended production settings and fully disable swap from your
Cassandra nodes

You might want to scale down from the 20GB heap as the OOM Killer will stop
your process either way, and it might allow you to have an analyzable heap
dump. Such a heap dump could tell us if there are lots of tombstones there
when the JVM dies.

I hope that's helpful as there is no easy answer here, and the problem
should be narrowed down by fixing all potential causes.

Cheers,




On Mon, Nov 21, 2016 at 5:10 PM Vincent Rischmann  wrote:

> Thanks for your answer Alexander.
>
> We're writing constantly to the table, we estimate it's something like
> 1.5k to 2k writes per second. Some of these requests update a bunch of
> fields, some update fields + append something to a set.
> We don't read constantly from it but when we do it's a lot of read, up to
> 20k reads per second sometimes.
> For this particular keyspace everything is using the size tiered
> compaction strategy.
>
>  - Every node is a physical server, has a 8-Core CPU, 32GB of ram and 3TB
> of SSD.
>  - Java version is 1.8.0_101 for all nodes except one which is using
> 1.8.0_111 (only for about a week I think, before that it used 1.8.0_101
> too).
>  - We're using the G1 GC. I looked at the 19th and on that day we had:
>   - 1505 GCs
>   - 2 Old Gen GCs which took around 5s each
>   - the rest are New Gen GCs, with only 1 other 1s. There's 15 to 20 GCs
> which took between 0.4 and 0.7s. The rest is between 250ms and 400ms
> approximately.
> Sometimes, there are 3/4/5 GCs in a row in like 2 seconds, each taking
> between 250ms to 400ms, but it's kinda rare from what I can see.
>  - So regarding GC logs, I have them enabled, I've got a bunch of gc.log.X
> files in /var/log/cassandra, but somehow I can't find any log files for
> certain periods. On one node which crashed this morning I lost like a week
> of GC logs, no idea what is happening there...
>  - I'll just put a couple of warnings here, there are around 9k just for
> today.
>
> WARN  [SharedPool-Worker-8] 2016-11-21 17:02:00,497
> SliceQueryFilter.java:320 - Read 2001 live and 11129 tombstone cells in
> foo.install_info for key: foo@IOS:7 (see tombstone_warn_threshold). 2000
> columns were requested, slices=[-]
> WARN  [SharedPool-Worker-1] 2016-11-21 17:02:02,559
> SliceQueryFilter.java:320 - Read 2001 live and 11064 tombstone cells in
> foo.install_info for key: foo@IOS:7 (see tombstone_warn_threshold). 2000
> columns were requested, slices=[di[42FB29E1-8C99-45BE-8A44-9480C50C6BC4]:!-]
> WARN  [SharedPool-Worker-2] 2016-11-21 17:02:05,286
> SliceQueryFilter.java:320 - Read 2001 live and 11064 tombstone cells in
> foo.install_info for key: foo@IOS:7 (see tombstone_warn_threshold). 2000
> columns were requested, slices=[di[42FB29E1-8C99-45BE-8A44-9480C50C6BC4]:!-]
> WARN  [SharedPool-Worker-11] 2016-11-21 17:02:08,860
> SliceQueryFilter.java:320 - Read 2001 live and 19966 tombstone cells in
> foo.install_info for key: foo@IOS:10 (see tombstone_warn_threshold). 2000
> columns were requested, slices=[-]
>
> So, we're guessing this is bad since it's warning us, however does this
> have a significant on the heap / GC ? I don't really know.
>
> -