Questions on time series use case, tombstones, TWCS

2017-08-09 Thread Steinmaurer, Thomas
Hello, our top contributor from a data volume perspective is time series data. We are running with STCS since our initial production deployment in 2014 with several clusters with a varying number of nodes, but currently with max. 9 nodes per single cluster per different region in AWS with

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Steinmaurer, Thomas
ndra/tools/toolsSSTableRepairedSet.html>. Cheers, On Fri, Sep 15, 2017 at 10:27 AM Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> wrote: Hi Alex, thanks a lot. Somehow missed that incremental repairs are the default now. We h

GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-15 Thread Steinmaurer, Thomas
Hello, we have a test (regression) environment hosted in AWS, which is used for auto deploying our software on a daily basis and attach constant load across all deployments. Basically to allow us to detect any regressions in our software on a daily basis. On the Cassandra-side, this is

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-15 Thread Steinmaurer, Thomas
, 2017, at 2:37 AM, Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, we have a test (regression) environment hosted in AWS, which is used for auto deploying our software on a daily basis and attach constant load across all

RE: Compaction in cassandra

2017-09-15 Thread Steinmaurer, Thomas
Hi, usually automatic minor compactions are fine, but you may need much more free disk space to reclaim disk space via automatic minor compactions, especially in a time series use case with size-tiered compaction strategy (possibly with leveled as well, I’m not familiar with this strategy

Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Steinmaurer, Thomas
Hello, we are currently in the process of upgrading from 2.1.18 to 3.0.14. After upgrading a few test environments, we start to see some suspicious log entries regarding repair issues. We have a cron job on all nodes basically executing the following repair call on a daily basis: nodetool

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Steinmaurer, Thomas
when nodetool or the logs show that repair is over (which will include the anticompaction phase). Cheers, On Fri, Sep 15, 2017 at 8:42 AM Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, we are currently in the process of upgradi

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-18 Thread Steinmaurer, Thomas
in 3.0 in context of CPU/GC and not disk savings? Thanks, Thomas From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] Sent: Freitag, 15. September 2017 13:51 To: user@cassandra.apache.org Subject: RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18) Hi Jeff, we are using

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-18 Thread Steinmaurer, Thomas
a/tools/toolsSSTableRepairedSet.html>. Cheers, On Fri, Sep 15, 2017 at 10:27 AM Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> wrote: Hi Alex, thanks a lot. Somehow missed that incremental repairs are the default now. We have been

RE: Row Cache hit issue

2017-09-19 Thread Steinmaurer, Thomas
Hi, additionally, with saved (key) caches, we had some sort of corruption (I think, for whatever reason) once. So, if you see something like that upon Cassandra startup: INFO [main] 2017-01-04 15:38:58,772 AutoSavingCache.java (line 114) reading saved cache

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-19 Thread Steinmaurer, Thomas
users may want to run keep running full repairs without the additional cost of anti-compaction. Would you mind opening a ticket for this? 2017-09-19 1:33 GMT-05:00 Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com>: > Hi Kurt, > > > > thanks for the link! > > > &

RE: Massive deletes -> major compaction?

2017-09-22 Thread Steinmaurer, Thomas
Additional to Kurt’s reply. Double disk usage is really the worst case. Most of the time you are fine having > largest column family free disk available. Also take local snapshots into account. Even after a finished major compaction, disk space may have not been reclaimed, if snapshot sym links

RE: Reg:- Install / Configure Cassandra on 2 DCs with 3 nodes

2017-09-19 Thread Steinmaurer, Thomas
Nandan, you may find the following useful. Slideshare: https://www.slideshare.net/DataStax/apache-cassandra-multidatacenter-essentials-julien-anguenot-iland-internet-solutions-c-summit-2016 Youtube: https://www.youtube.com/watch?v=G6od16YKSsA From a client perspective, if you are targeting

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-18 Thread Steinmaurer, Thomas
wish to do this, you'll have to mark back all your sstables to unrepaired, using nodetool sstablerepairedset<https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSSTableRepairedSet.html>. Cheers, On Fri, Sep 15, 2017 at 10:27 AM Steinmaurer, Thomas <thomas.steinmau...@dynatrace.c

RE: network down between DCs

2017-09-21 Thread Steinmaurer, Thomas
Hi, within the default hint window of 3 hours, the hinted handoff mechanism should take care of that, but we have seen that failing from time to time (depending on the load) in 2.1 with some sort of tombstone related issues causing failing requests on the system hints table. So, watch out any

RE: Node failure

2017-10-06 Thread Steinmaurer, Thomas
QUORUM should succeed with a RF=3 and 2 of 3 nodes available. Modern client drivers also have ways to “downgrade” the CL of requests, in case they fail. E.g. for the Java driver:

RE: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Steinmaurer, Thomas
G1 suggested settings http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTExLTE3 @Steinmaurer, Thomas If this happens in a very short very frequently and depending on your allocation rate in MB/s, a combination of the G1 bug and a small heap, might result

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-25 Thread Steinmaurer, Thomas
://issues.apache.org/jira/browse/CASSANDRA-13900. Feel free to request any further additional information on the ticket. Unfortunately this is a real show-stopper for us upgrading to 3.0. Thanks for your attention. Thomas From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] Sent

RE:

2017-09-28 Thread Steinmaurer, Thomas
Dan, do you see any major GC? We have been hit by the following memory leak in our loadtest environment with 3.11.0. https://issues.apache.org/jira/browse/CASSANDRA-13754 So, depending on the heap size and uptime, you might get into heap troubles. Thomas From: Dan Kinder

Cassandra 3.11.1 (snapshot build) - io.netty.util.Recycler$Stack memory leak

2017-10-01 Thread Steinmaurer, Thomas
Hello, we were facing a memory leak with 3.11.0 (https://issues.apache.org/jira/browse/CASSANDRA-13754) thus upgraded our loadtest environment to a snapshot build of 3.11.1. Having it running for > 48 hrs now, we still see a steady increase on heap utilization. Eclipse memory analyzer shows

RE: space left for compaction

2017-10-01 Thread Steinmaurer, Thomas
Hi, half of free space does not make sense. Imagine your SSTables need 100G space and you have 20G free disk. Compaction won't be able to do its job with 10G. Half free of total disk makes more sense and is what you need for a major compaction worst case. Thomas From: Peng Xiao

RE: Alter table gc_grace_seconds

2017-10-02 Thread Steinmaurer, Thomas
Hello Justin, yes, but in real-world, hard to accomplish for high volume column families >= 3-digit GB. Even with the default of 10 days grace period, this may get a real challenge, to complete a full repair. ☺ Possibly back again to the discussion that incremental repair has some flaws, full

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-26 Thread Steinmaurer, Thomas
situation after upgrading from 2.1.14 to 3.11 in our production. Have you already tried G1GC instead of CMS? Our timeouts were mitigated after replacing CMS with G1GC. Thanks. 2017-09-25 20:01 GMT+09:00 Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatra

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-26 Thread Steinmaurer, Thomas
mitigated after replacing CMS with G1GC. Thanks. 2017-09-25 20:01 GMT+09:00 Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>>: Hello, I have now some concrete numbers from our 9 node loadtest cluster with constant load, same infrastructure aft

RE: 回复: nodetool cleanup in parallel

2017-09-26 Thread Steinmaurer, Thomas
Side-note: At least with 2.1 (or even later), be aware that you might run into the following issue: https://issues.apache.org/jira/browse/CASSANDRA-11155 We are doing cron―job based hourly snapshots in production and have tried to also run cleanup after extending a cluster from 6 to 9 nodes.

RE: Got error, removing parent repair session - When doing multiple repair -pr — Cassandra 3.x

2017-10-07 Thread Steinmaurer, Thomas
Marshall, -pr should not be used with incremental repairs, which is the default since 2.2. But even when used with full repairs (-full option), this will cause troubles when running nodetool repair -pr from several nodes concurrently. So, unfortunately, this does not seem to work anymore and

RE: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Steinmaurer, Thomas
Hi, although not happening here with Cassandra (due to using CMS), we had some weird problem with our server application e.g. hit by the following JVM/G1 bugs: https://bugs.openjdk.java.net/browse/JDK-8140597 https://bugs.openjdk.java.net/browse/JDK-8141402 (more or less a duplicate of above)

RE: Not serving read requests while running nodetool repair

2017-10-18 Thread Steinmaurer, Thomas
? On 18 October 2017 at 08:04, Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, due to performance/latency reasons, we are currently reading and writing time series data at consistency level ONE/ANY. In case of a node bei

RE: Not serving read requests while running nodetool repair

2017-10-18 Thread Steinmaurer, Thomas
read requests while running nodetool repair You can accomplish this by manually tweaking the values in the dynamic snitch mbean so other nodes won’t select it for reads -- Jeff Jirsa On Oct 18, 2017, at 3:24 AM, Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.st

Not serving read requests while running nodetool repair

2017-10-18 Thread Steinmaurer, Thomas
Hello, due to performance/latency reasons, we are currently reading and writing time series data at consistency level ONE/ANY. In case of a node being down and recovering after the default hinted handoff window of 3 hrs, we may potentially read stale data from the recovering node. Of course,

cassandra.yaml configuration for large machines (scale up vs. scale out)

2017-11-03 Thread Steinmaurer, Thomas
Hello, I know that Cassandra is built for scale out on commodity hardware, but I wonder if anyone can share some experience when running Cassandra on rather capable machines. Let's say we have a 3 node cluster with 128G RAM, 32 physical cores (16 per CPU socket), Large Raid with Spinning

RE: Stable Cassandra 3.x version for production

2017-11-07 Thread Steinmaurer, Thomas
Latest DSE is based on 3.11 (possibly due to CASSANDRA-12269, but just a guess). For us (only), none of 3.0+/3.11+ qualifies for production to be honest, when you are familiar with having 2.1 in production. · 3.0 needs more hardware resources to handle the same load =>

RE: Anyone try out C* with latest Oracle JDK update?

2018-05-24 Thread Steinmaurer, Thomas
Hi Sam, in our pre-production stages, we are running Cassandra 3.0 and 3.11 with 8u172 (previously u102 then u162) without any visible troubles/regressions. In case of Cassandra 3.11, you need 3.11.2 due to: https://issues.apache.org/jira/browse/CASSANDRA-14173. Cassandra 3.0 is not affected

compaction_throughput: Difference between 0 (unthrottled) and large value

2018-06-11 Thread Steinmaurer, Thomas
Hello, on a 3 node loadtest cluster with very capable machines (32 physical cores, 512G RAM, 20T storage (26 disk RAID)), I'm trying to max out compaction, thus currently testing with: concurrent_compactors: 16 compaction_throughput_mb_per_sec: 0 With our simulated incoming load + compaction

RE: compaction_throughput: Difference between 0 (unthrottled) and large value

2018-06-11 Thread Steinmaurer, Thomas
Sorry, should have first looked at the source code. In case of 0, it is set to Double.MAX_VALUE. Thomas From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] Sent: Montag, 11. Juni 2018 08:53 To: user@cassandra.apache.org Subject: compaction_throughput: Difference between 0

RE: G1GC CPU Spike

2018-06-13 Thread Steinmaurer, Thomas
Explicitly setting Xmn with G1 basically results in overriding the target pause-time goal, thus should be avoided. http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html Thomas From: rajpal reddy [mailto:rajpalreddy...@gmail.com] Sent: Mittwoch, 13. Juni 2018 17:27 To:

RE: nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

2018-05-29 Thread Steinmaurer, Thomas
Hi Kurt, thanks for pointing me to the Xmx issue. JIRA + patch (for Linux only based on C* 3.11) for the parallel GC thread issue is available here: https://issues.apache.org/jira/browse/CASSANDRA-14475 Thanks, Thomas From: kurt greaves [mailto:k...@instaclustr.com] Sent: Dienstag, 29. Mai

RE: nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

2018-05-29 Thread Steinmaurer, Thomas
heapsize by default will be 256mb, which isn't hugely problematic, and it's unlikely more than that would get allocated. On 29 May 2018 at 09:29, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hi Kurt, thanks for pointing me to the Xmx issue. JIRA + patch (for Linux only

Compaction throughput vs. number of compaction threads?

2018-06-05 Thread Steinmaurer, Thomas
Hello, most likely obvious and perhaps already answered in the past, but just want to be sure ... E.g. I have set: concurrent_compactors: 4 compaction_throughput_mb_per_sec: 16 I guess this will lead to ~ 4MB/s per Thread if I have 4 compactions running in parallel? So, in case of upscaling

RE: 3.11.2 memory leak

2018-06-04 Thread Steinmaurer, Thomas
Jeff, FWIW, when talking about https://issues.apache.org/jira/browse/CASSANDRA-13929, there is a patch available since March without getting further attention. Regards, Thomas From: Jeff Jirsa [mailto:jji...@gmail.com] Sent: Dienstag, 05. Juni 2018 00:51 To: cassandra Subject: Re: 3.11.2

nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

2018-05-28 Thread Steinmaurer, Thomas
Hello, on a quite capable machine with 32 physical cores (64 vCPUs) we see sporadic CPU usage up to 50% caused by nodetool on this box, thus digged a bit further. A few observations: 1) nodetool is reusing the $MAX_HEAP_SIZE environment variable, thus if we are running Cassandra with e.g.

Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-05 Thread Steinmaurer, Thomas
Hello, has anybody already some experience/results if a patched Linux kernel regarding Meltdown/Spectre is affecting performance of Cassandra negatively? In production, all nodes running in AWS with m4.xlarge, we see up to a 50% relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4,

RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-10 Thread Steinmaurer, Thomas
AM, Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> wrote: Quick follow up. Others in AWS reporting/seeing something similar, e.g.: https://twitter.com/BenBromhead/status/950245250504601600 So, while we have seen an relative CPU incr

RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread Steinmaurer, Thomas
and not production though), thus more or less double patched now. Additional CPU impact by OS/VM level kernel patching is more or less negligible, so looks highly Hypervisor related. Regards, Thomas From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] Sent: Freitag, 05. Jänner 2018

Cleanup blocking snapshots - Options?

2018-01-14 Thread Steinmaurer, Thomas
Hello, we are running 2.1.18 with vnodes in production and due to (https://issues.apache.org/jira/browse/CASSANDRA-11155) we can't run cleanup e.g. after extending the cluster without blocking our hourly snapshots. What options do we have to get rid of partitions a node does not own anymore?

RE: Cassandra 3.11 fails to start with JDK8u162

2018-01-18 Thread Steinmaurer, Thomas
? On Thu, Jan 18, 2018 at 2:32 AM, Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> wrote: Sam, thanks for the confirmation. Going back to u152 then. Thomas From: li...@beobal.com<mailto:li...@beobal.com> [mailto:li...@be

Cassandra 3.11 fails to start with JDK8u162

2018-01-17 Thread Steinmaurer, Thomas
Hello, after switching from JDK8u152 to JDK8u162, Cassandra fails with the following stack trace upon startup. ERROR [main] 2018-01-18 07:33:18,804 CassandraDaemon.java:706 - Exception encountered during startup java.lang.AbstractMethodError:

RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-14 Thread Steinmaurer, Thomas
t;> wrote: For what it’s worth, we (TLP) just posted some results comparing pre and post meltdown statistics: http://thelastpickle.com/blog/2018/01/10/meltdown-impact-on-latency.html On Jan 10, 2018, at 1:57 AM, Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinma

RE: Cleanup blocking snapshots - Options?

2018-01-14 Thread Steinmaurer, Thomas
e logs on the JIRA or better yet a way to reproduce? On 14 January 2018 at 16:12, Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, we are running 2.1.18 with vnodes in production and due to (https://issues.apache.org/jira/browse/C

RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-13 Thread Steinmaurer, Thomas
tdown-impact-on-latency.html On Jan 10, 2018, at 1:57 AM, Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> wrote: m4.xlarge do have PCID to my knowledge, but possibly we need a rather new kernel 4.14. But I fail to see how this

RE: Cleanup blocking snapshots - Options?

2018-01-30 Thread Steinmaurer, Thomas
on in 2.1 that triggered this and it wasn't worth fixing. If you are triggering it easily maybe it is worth fixing in 2.1 as well. Does this happen consistently? Can you provide some more logs on the JIRA or better yet a way to reproduce? On 14 January 2018 at 16:12, Steinmaurer, Thomas <tho

RE: Old tombstones not being cleaned up

2018-02-01 Thread Steinmaurer, Thomas
Did you started with a 9 node cluster from the beginning or did you extend / scale out your cluster (with vnodes) beyond the replication factor? If later applies and if you are deleting by explicit deletes and not via TTL, then nodes might not see the deletes anymore, as a node might not own

RE: Old tombstones not being cleaned up

2018-02-01 Thread Steinmaurer, Thomas
of 3, then added another 3 nodes and again another 3 nodes. So it is a good guess :) But I have run both repair and cleanup against the table on all nodes, would that not have removed any stray partitions? tor. 1. feb. 2018 kl. 22.31 skrev Steinmaurer, Thomas <thomas.steinmau...@dynatrace.

RE: if the heap size exceeds 32GB..

2018-02-12 Thread Steinmaurer, Thomas
Stick with 31G in your case. Another article on compressed Oops: https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/ Thomas From: Eunsu Kim [mailto:eunsu.bil...@gmail.com] Sent: Dienstag, 13. Februar 2018 08:09 To: user@cassandra.apache.org Subject: if the heap

RE: Reaper 1.2 released

2018-07-25 Thread Steinmaurer, Thomas
Jon, eager trying it out.  Just FYI. Followed the installation instructions on http://cassandra-reaper.io/docs/download/install/ Debian-based. 1) Importing the key results in: XXX:~$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 2895100917357435 Executing:

Configuration parameter to reject incremental repair?

2018-08-07 Thread Steinmaurer, Thomas
Hello, we are running Cassandra in AWS and On-Premise at customer sites, currently 2.1 in production with 3.11 in loadtest. In a migration path from 2.1 to 3.11.x, I'm afraid that at some point in time we end up in incremental repairs being enabled / ran a first time unintentionally, cause:

Data Corruption due to multiple Cassandra 2.1 processes?

2018-08-06 Thread Steinmaurer, Thomas
Hello, with 2.1, in case a second Cassandra process/instance is started on a host (by accident), may this result in some sort of corruption, although Cassandra will exit at some point in time due to not being able to bind TCP ports already in use? What we have seen in this scenario is

RE: Data Corruption due to multiple Cassandra 2.1 processes?

2018-08-13 Thread Steinmaurer, Thomas
have gone to 2.1 in the first place, but it just got missed. Very simple patch so I think a backport should be accepted. On 7 August 2018 at 15:57, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, with 2.1, in case a second Cassandra process/instance is started on

nodetool cleanup - compaction remaining time

2018-09-05 Thread Steinmaurer, Thomas
Hello, is it a known issue / limitation that cleanup compactions aren't counted in the compaction remaining time? nodetool compactionstats -H pending tasks: 1 compaction type keyspace table completed totalunit progress CleanupXXX YYY

RE: Configuration parameter to reject incremental repair?

2018-09-09 Thread Steinmaurer, Thomas
incremental repair? No flag currently exists. Probably a good idea considering the serious issues with incremental repairs since forever, and the change of defaults since 3.0. On 7 August 2018 at 16:44, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, we are r

RE: nodetool cleanup - compaction remaining time

2018-09-06 Thread Steinmaurer, Thomas
t : Probably worth a JIRA (especially if you can repro in 3.0 or higher, since 2.1 is critical fixes only) On Wed, Sep 5, 2018 at 10:46 PM Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, is it a known issue / limitation that cleanup compactions aren’t counted in t

RE: nodetool cleanup - compaction remaining time

2018-09-07 Thread Steinmaurer, Thomas
in 3.0 or higher, since 2.1 is critical fixes only) On Wed, Sep 5, 2018 at 10:46 PM Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, is it a known issue / limitation that cleanup compactions aren’t counted in the compaction remaining time? nodetool compactionst

RE: Data Corruption due to multiple Cassandra 2.1 processes?

2018-09-05 Thread Steinmaurer, Thomas
2.1 processes? New ticket for backporting, referencing the existing. On Mon., 13 Aug. 2018, 22:50 Steinmaurer, Thomas, mailto:thomas.steinmau...@dynatrace.com>> wrote: Thanks Kurt. What is the proper workflow here to get this accepted? Create a new ticket dedicated for the backport refer

RE: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Steinmaurer, Thomas
From: Jeff Jirsa Sent: Montag, 10. September 2018 19:40 To: cassandra Subject: Re: Drop TTLd rows: upgradesstables -a or scrub? I think it's important to describe exactly what's going on for people who just read the list but who don't have context. This blog does a really good job:

Scrub a single SSTable only?

2018-09-11 Thread Steinmaurer, Thomas
Hello, is there a way to Online scrub a particular SSTable file only and not the entire column family? According to the Cassandra logs we have a corrupted SSTable smallish compared to the entire data volume of the column family in question. To my understanding, both, nodetool scrub and

RE: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Steinmaurer, Thomas
As far as I remember, in newer Cassandra versions, with STCS, nodetool compact offers a ‘-s’ command-line option to split the output into files with 50%, 25% … in size, thus in this case, not a single largish SSTable anymore. By default, without -s, it is a single SSTable though. Thomas From:

RE: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Steinmaurer, Thomas
. September 2018 09:47 To: User Subject: Re: Drop TTLd rows: upgradesstables -a or scrub? On Tue, Sep 11, 2018 at 9:31 AM Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: As far as I remember, in newer Cassandra versions, with STCS, nodetool compact offers a ‘-s’ comman

RE: Cassandra 3.11 fails to start with JDK8u162

2018-01-18 Thread Steinmaurer, Thomas
o downgrade back to 152 then ! On 18 January 2018 at 08:34, Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, after switching from JDK8u152 to JDK8u162, Cassandra fails with the following stack trace upon startup. E

Cassandra 3.11 - nodetool cleanup - Compaction interrupted ...

2018-01-22 Thread Steinmaurer, Thomas
Hello, when triggering a "nodetool cleanup" with Cassandra 3.11, the nodetool call almost returns instantly and I see the following INFO log. INFO [CompactionExecutor:54] 2018-01-22 12:59:53,903 CompactionManager.java:1777 - Compaction interrupted:

Cassandra 2.1.18 - Concurrent nodetool repair resulting in > 30K SSTables for a single small (GBytes) CF

2018-03-01 Thread Steinmaurer, Thomas
Hello, Production, 9 node cluster with Cassandra 2.1.18, vnodes, default 256 tokens, RF=3, compaction throttling = 16, concurrent compactors = 4, running in AWS using m4.xlarge at ~ 35% CPU AVG We have a nightly cronjob starting a "nodetool repair -pr ks cf1 cf2" concurrently on all nodes,

RE: Cassandra 2.1.18 - Concurrent nodetool repair resulting in > 30K SSTables for a single small (GBytes) CF

2018-03-06 Thread Steinmaurer, Thomas
Hi Kurt, our provisioning layer allows extending a cluster only one-by-one, thus we didn’t add multiple nodes at the same time. What we did have was some sort of overlapping between our daily repair cronjob and the newly added node still in progress joining. Don’t know if this sort of

Repair of 5GB data vs. disk throughput does not make sense

2018-04-26 Thread Steinmaurer, Thomas
Hello, yet another question/issue with repair. Cassandra 2.1.18, 3 nodes, RF=3, vnode=256, data volume ~ 5G per node only. A repair (nodetool repair -par) issued on a single node at this data volume takes around 36min with an AVG of ~ 15MByte/s disk throughput (read+write) for the entire

Cassandra 2.1 bootstrap - No streaming progress from one node

2018-11-07 Thread Steinmaurer, Thomas
Hello, while bootstrapping a new node into an existing cluster, a node which is acting as source for streaming got restarted unfortunately. Since then, from nodetool netstats I don't see any progress for this particular node anymore. E.g.: /X.X.X.X Receiving 94 files, 260.09 GB total.

RE: Cassandra 2.1.21 ETA?

2018-10-01 Thread Steinmaurer, Thomas
age- > From: Michael Shuler On Behalf Of Michael > Shuler > Sent: Freitag, 21. September 2018 15:49 > To: user@cassandra.apache.org > Subject: Re: Cassandra 2.1.21 ETA? > > On 9/21/18 3:28 AM, Steinmaurer, Thomas wrote: > > > > is there an ETA for 2.1.21 con

Cassandra 2.1.21 ETA?

2018-09-21 Thread Steinmaurer, Thomas
Hello, is there an ETA for 2.1.21 containing the logback update (security vulnerability fix)? Thanks, Thomas The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee,

RE: Cassandra 2.1.21 ETA?

2018-09-21 Thread Steinmaurer, Thomas
> On 9/21/18 3:28 AM, Steinmaurer, Thomas wrote: > > > > is there an ETA for 2.1.21 containing the logback update (security > > vulnerability fix)? > > Are you using SocketServer? Is your cluster firewalled? > > Feb 2018 2.1->3.11 commits noting this in NEWS.txt: > ht

JMX metric for dropped hints?

2019-01-21 Thread Steinmaurer, Thomas
Hello, is there a JMX metric for monitoring dropped hints as a counter/rate, equivalent to what we see in Cassandra log, e.g.: WARN [HintedHandoffManager:1] 2018-11-13 13:28:46,991 HintedHandoffMetrics.java:79 - /XXX has 18180 dropped hints, because node is down past configured hint window.

RE: JMX metric for dropped hints?

2019-01-22 Thread Steinmaurer, Thomas
org.apache.cassandra.metrics/DroppedMessage/HINT/Attributes/FiveMinuteRate org.apache.cassandra.metrics/DroppedMessage/HINT/Attributes/FifteenMinuteRate Hayato On Tue, 22 Jan 2019 at 07:45, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, is there a JMX metric for monitoring d

RE: Read timeouts when performing rolling restart

2018-09-12 Thread Steinmaurer, Thomas
Hi, I remember something that a client using the native protocol gets notified too early by Cassandra being ready due to the following issue: https://issues.apache.org/jira/browse/CASSANDRA-8236 which looks similar, but above was marked as fixed in 2.2. Thomas From: Riccardo Ferrari Sent:

Apache Thrift library 0.9.2 update due to security vulnerability?

2018-09-14 Thread Steinmaurer, Thomas
Hello, a Blackduck security scan of our product detected a security vulnerability in the Apache Thrift library 0.9.2, which is shipped in Cassandra up to 3.11 (haven't checked 4.0), also pointed out here:

RE: Major compaction ignoring one SSTable? (was Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?))

2018-09-18 Thread Steinmaurer, Thomas
Alex, any indications in Cassandra log about insufficient disk space during compactions? Thomas From: Oleksandr Shulgin Sent: Dienstag, 18. September 2018 10:01 To: User Subject: Major compaction ignoring one SSTable? (was Re: Fresh SSTable files (due to repair?) in a static table (was Re:

RE: Cassandra 2.1.18 - NPE during startup

2019-03-27 Thread Steinmaurer, Thomas
Hello, any ideas regarding below, cause it happened again on a different node. Thanks Thomas From: Steinmaurer, Thomas Sent: Dienstag, 05. Februar 2019 23:03 To: user@cassandra.apache.org Subject: Cassandra 2.1.18 - NPE during startup Hello, at a particular customer location, we are seeing

Cassandra 2.1.18 - NPE during startup

2019-02-05 Thread Steinmaurer, Thomas
Hello, at a particular customer location, we are seeing the following NPE during startup with Cassandra 2.1.18. INFO [SSTableBatchOpen:2] 2019-02-03 13:32:56,131 SSTableReader.java:475 - Opening

RE: Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Steinmaurer, Thomas
arity. Per-second metrics might show CPU cores getting pegged. I’m not sure that GC tuning eliminates this problem, but if it isn’t being caused by that, GC tuning may at least improve the visibility of the underlying problem. From: "Steinmaurer, Thomas" mailto:thomas.steinmau..

Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Steinmaurer, Thomas
Hello, after moving from 2.1.18 to 3.0.18, we are facing OOM situations after several hours a node has successfully joined a cluster (via auto-bootstrap). I have created the following ticket trying to describe the situation, including hprof / MAT screens:

Cassandra 3.0.20 release ETA?

2019-11-13 Thread Steinmaurer, Thomas
Hello, sorry, I know, 3.0.19 has been released just recently. Any ETA for 3.0.20? Reason is that we are having quite some pain with on-heap pressure after moving from 2.1.18 to 3.0.18. https://issues.apache.org/jira/browse/CASSANDRA-15400 Thanks a lot, Thomas The contents of this e-mail are

Cassandra 3.0.18 showing ~ 10x higher on-heap allocations for processing batch messages compared to 2.1.18

2019-11-15 Thread Steinmaurer, Thomas
Hello, looks like 3.0.18 can't handle the same write ingest compared to 2.1.18 on the same hardware. Basically it looks like the write path, processing batch messages show 10x higher numbers in regard to on-heap allocations. I've tried to summarize the finding on the following ticket:

Cassandra 2.1.18 - Question on stream/bootstrap throughput

2019-10-22 Thread Steinmaurer, Thomas
Hello, using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based), vnodes=256, RF=3, we are trying to add a 4th node. The two options to my knowledge, mainly affecting throughput, namely stream output and compaction throttling has been set to very high values (e.g. stream output = 800 Mbit/s resp.

RE: Cassandra 2.1.18 - Question on stream/bootstrap throughput

2019-10-22 Thread Steinmaurer, Thomas
e fighting against then. It is easy to have a box that looks unused but in reality its struggling. Given that you’ve opened up the floodgates on compaction, that would seem quite plausible to be what you are experiencing. From: "Steinmaurer, Thomas" mailto:thomas.steinmau...@dynatra

RE: Cassandra 2.1.18 - Question on stream/bootstrap throughput

2019-10-22 Thread Steinmaurer, Thomas
: Oleksandr Shulgin Sent: Dienstag, 22. Oktober 2019 16:35 To: User Subject: Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput On Tue, Oct 22, 2019 at 12:47 PM Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based),

RE: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-30 Thread Steinmaurer, Thomas
If possible, prefer m5 over m4, cause they are running on a newer hypervisor (KVM-based), single core performance is ~ 10% better compared to m4 with m5 even being slightly cheaper than m4. Thomas From: Erick Ramirez Sent: Donnerstag, 30. Jänner 2020 03:00 To: user@cassandra.apache.org

Cassandra 3.0.19 and 3.11.5 cannot start on Windows

2020-01-10 Thread Steinmaurer, Thomas
Hello, https://issues.apache.org/jira/browse/CASSANDRA-15426. According to the ticket, changes in https://issues.apache.org/jira/browse/CASSANDRA-15053 likely being the root cause. Will this be fixed in 3.0.20 and 3.11.6? Thanks, Thomas The contents of this e-mail are intended for the named

Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-28 Thread Steinmaurer, Thomas
Leon, we had an awful performance/throughput experience with 3.x coming from 2.1. 3.11 is simply a memory hog, if you are using batch statements on the client side. If so, you are likely affected by https://issues.apache.org/jira/browse/CASSANDRA-16201 Regards, Thomas