Re: good monitoring tool for cassandra

2019-03-14 Thread Jonathan Haddad
I've worked with several teams using DataDog, folks are pretty happy with it. We (The Last Pickle) did the dashboards for them: http://thelastpickle.com/blog/2017/12/05/datadog-tlp-dashboards.html Prometheus + Grafana is great if you want to host it yourself. On Fri, Mar 15, 2019 at 12:45 PM

Re: To Repair or Not to Repair

2019-03-14 Thread Jonathan Haddad
My coworker Alex (from The Last Pickle) wrote an in depth blog post on TWCS. We recommend not running repair on tables that use TWCS. http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html It's enough of a problem that we added a feature into Reaper to auto-blacklist TWCS / DTCS tables from

Re: cassandra upgrades multi-DC in parallel

2019-03-12 Thread Jonathan Haddad
Nothing prevents it technically, but operationally you might not want to. Personally I’d prefer have the safety net of a dc to fall back on in case there’s an issue with the upgrade. On Wed, Mar 13, 2019 at 7:48 AM Carl Mueller wrote: > If there are multiple DCs in a cluster, is it safe to

Re: Maximum memory usage reached

2019-03-06 Thread Jonathan Haddad
That’s not an error. To the left of the log message is the severity, level INFO. Generally, I don’t recommend running Cassandra on only 2GB ram or for small datasets that can easily fit in memory. Is there a reason why you’re picking Cassandra for this dataset? On Thu, Mar 7, 2019 at 8:04 AM

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-27 Thread Jonathan Haddad
If the goal is arbitrary queries, I'd avoid Cassandra altogether. Don't use DSE Search or Ellesandra, they're two solutions designed to solve problems that are Cassandra first, search second. I'd go straight to elastic search for workloads that are primarily search driven, like you listed above.

Reaper 1.4 released

2019-02-15 Thread Jonathan Haddad
Hey folks, I'm happy to share we (The Last Pickle) have just released version 1.4 of Reaper. For those of you who aren't aware of the project, it's an open source tool for managing sub-range repairs, originally created by Spotify, which we picked up and adopted about two years ago. There's a

Re: Usage of allocate_tokens_for_keyspace for a new cluster

2019-02-14 Thread Jonathan Haddad
Create the first node, setting the tokens manually. Create the keyspace. Add the rest of the nodes with the allocate tokens uncommented. On Thu, Feb 14, 2019 at 11:43 AM DuyHai Doan wrote: > Hello users > > By looking at the mailing list archive, there was already some questions > about the

Re: Max number of windows when using TWCS

2019-02-11 Thread Jonathan Haddad
Deleting SSTables manually can be useful if you don't know your TTL up front. For example, you have an ETL process that moves your raw Cassandra data into S3 as parquet files, and you want to be sure that process is completed before you delete the data. You could also start out without setting a

Re: datamodelling

2019-02-05 Thread Jonathan Haddad
We (The Last Pickle) wrote a blog post on scaling time series: http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html Rather than an agent_type, you can use a application determined bucket, so that agents with more data use more buckets. That'll keep your partition

Re: High CPU usage on reading single row with Set column with short TTL

2019-01-28 Thread Jonathan Haddad
Your fastest route might be to run a profiler on Cassandra and get some flame graphs. I'm a fan of the async-profiler: https://github.com/jvm-profiling-tools/async-profiler Joey Lynch did a nice write up in the documentation on a different process, which I haven't used yet:

Re: Datastax Java Driver compatibility

2019-01-22 Thread Jonathan Haddad
The drivers are not maintained by the Cassandra project, it's up to each driver maintainer to list their compatibility. On Tue, Jan 22, 2019 at 10:48 AM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Thanks for the response Amanda, > > Yes we can go with the latest version but we

Re: Released an ACID-compliant transaction library on top of Cassandra

2019-01-16 Thread Jonathan Haddad
Sounds a bit like RAMP: http://rustyrazorblade.com/post/2015/ramp-made-easy/ On Wed, Jan 16, 2019 at 12:51 PM Carl Mueller wrote: > "2) Overview: In essence, the protocol calls for each data item to > maintain the last committed and perhaps also the currently active version, > for the data and

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-09 Thread Jonathan Haddad
new Firstname, old Lastname > > having updates on columns atomically guarantees you to have new Firstname, > new Lastname > > On Fri, Jan 4, 2019 at 8:17 PM Jonathan Haddad wrote: > >> Those are two different cases though. It *sounds like* (again, I may be >> miss

Re: Cassandra and Apache Arrow

2019-01-09 Thread Jonathan Haddad
ram on the homepage displaying Cassandra (with other > storages) as source of data. > https://arrow.apache.org/img/shared.png > > Which made me think there should be some integration... > > On Thu, 10 Jan 2019, 12:38 am Jonathan Haddad >> Where are you seeing that it wor

Re: Cassandra and Apache Arrow

2019-01-09 Thread Jonathan Haddad
Where are you seeing that it works with Cassandra? There's no mention of it under https://arrow.apache.org/powered_by/, and on the homepage it says only says that a Cassandra developer worked on it. We (unfortunately) don't do anything with it at the moment. On Wed, Jan 9, 2019 at 3:24 PM Tomas

Re: How seed nodes are working and how to upgrade/replace them?

2019-01-08 Thread Jonathan Haddad
I've done some gossip simulations in the past and found virtually no difference in the time it takes for messages to propagate in almost any sized cluster. IIRC it always converges by 17 iterations. Thus, I completely agree with Jeff's comment here. If you aren't pushing 800-1000 nodes, it's

Re: SSTableMetadata Util

2019-01-07 Thread Jonathan Haddad
Try installing the cassandra-tools package. On Mon, Jan 7, 2019 at 1:20 AM Igor Zubchenok wrote: > Same issue with 3.11.3: > > # find / -name sstable* > /usr/bin/sstableverify > /usr/bin/sstableupgrade > /usr/bin/sstableloader > /usr/bin/sstableutil > /usr/bin/sstablescrub > > only these

Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-04 Thread Jonathan Haddad
If you absolutely have to use Cassandra as the source of your data, I agree with Dor. That being said, if you're going to be doing a lot of analytics, I recommend using something other than Cassandra with Spark. The performance isn't particularly wonderful and you'll likely get anywhere from

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Jonathan Haddad
rows. With workloads > that generate a lot of tombstones, this can cause performance problems and > even exhaust the server heap. "* > > Regards, > Tomas > > On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad >> If you're overwriting values, it really doesn't matter much

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Jonathan Haddad
If you're overwriting values, it really doesn't matter much if it's a tombstone or any other value, they still need to be compacted and have the same overhead at read time. Tombstones are problematic when you try to use Cassandra as a queue (or something like a queue) and you need to scan over

Re: Sub range repair

2019-01-01 Thread Jonathan Haddad
We (the last pickle) maintain an open source tool for dealing with this: http://cassandra-reaper.io On Tue, Jan 1, 2019 at 12:31 PM Rahul Reddy wrote: > Hello, > > Is it possible to find subrange needed for repair in Apache Cassandra like > dse which uses dsetool list_subranges like below doc >

Re: Cassandra Integrated Auth for JMX

2018-12-16 Thread Jonathan Haddad
Jolokia is running as an agent, which means it runs in process and has access to everything within the JVM. JMX credentials are supplies to the JMX server, which Jolokia is bypassing. You'll need to read up on Jolokia's security if you want to keep using it:

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-05 Thread Jonathan Haddad
Seeing high kswapd usage means there's a lot of churn in the page cache. It doesn't mean you're using swap, it means the box is spending time clearing pages out of the page cache to make room for the stuff you're reading now. The machines don't have enough memory - they are way undersized for a

Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

2018-12-01 Thread Jonathan Haddad
Dmitry is right. Generally speaking always go with the latest bug fix release. On Sat, Dec 1, 2018 at 10:14 AM Dmitry Saprykin wrote: > See more here > https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-13004 > > On Sat, Dec 1, 2018 at 1:02 PM Dmitry Saprykin > wrote: > >>

Re: multiple node bootstrapping

2018-11-28 Thread Jonathan Haddad
Agree with Jeff here, using auto_bootstrap:false is probably not what you want. Have you increased your streaming throughput? Upgrading to 3.11 might reduce the time by quite a bit: https://issues.apache.org/jira/browse/CASSANDRA-9766 You'd be doing committers a huge favor if you grabbed some

Re: system_auth keyspace replication factor

2018-11-23 Thread Jonathan Haddad
Any chance you’re logging in with the Cassandra user? It uses quorum reads. On Fri, Nov 23, 2018 at 11:38 AM Vitali Dyachuk wrote: > Hi, > We have recently met a problem when we added 60 nodes in 1 region to the > cluster > and set an RF=60 for the system_auth ks, following this documentation

Re: [EXTERNAL] Is Apache Cassandra supports Data at rest

2018-11-14 Thread Jonathan Haddad
Just because Cassandra doesn't do it doesn't mean you aren't able to encrypt your data at rest, and you definitely don't need DSE to do it. I recommend checking out the LUKS project. https://gitlab.com/cryptsetup/cryptsetup/blob/master/README.md This, IMO, is a better option than having the

Re: Multiple cluster for a single application

2018-11-07 Thread Jonathan Haddad
Interesting approach Eric, thanks for sharing that. Regarding this: > I've read documents recommended to use clusters with less than 50 or 100 nodes (Netflix got hundreds of clusters with less 100 nodes on each). Not sure where you read that, but it's nonsense. We work with quite a few

Re: data modeling appointment scheduling

2018-11-04 Thread Jonathan Haddad
heduled ? > > thanks. > > IPVP > > On November 4, 2018 at 7:25:05 PM, Jonathan Haddad (j...@jonhaddad.com) > wrote: > > Maybe I’m missing something, but it seems to me that the bucket might be a > little overkill for a scheduling system. Do you expect people to have > mill

Re: data modeling appointment scheduling

2018-11-04 Thread Jonathan Haddad
Maybe I’m missing something, but it seems to me that the bucket might be a little overkill for a scheduling system. Do you expect people to have millions of appointments? On Sun, Nov 4, 2018 at 12:46 PM I PVP wrote: > Could you please provide advice on the modeling approach for the following >

Re: [ANNOUNCE] StratIO's Lucene plugin fork

2018-10-30 Thread Jonathan Haddad
Very cool Ben, thanks for sharing! On Tue, Oct 30, 2018 at 6:14 PM Ben Slater wrote: > For anyone who is interested, we’ve published a blog with some more > background on this and some more detail of our ongoing plans: > https://www.instaclustr.com/instaclustr-support-cassandra-lucene-index/ >

Re: Cassandra | Cross Data Centre Replication Status

2018-10-30 Thread Jonathan Haddad
You need to run "nodetool rebuild -- " on each node in the new DC to get the old data to replicate. It doesn't do it automatically because Cassandra has no way of knowing if you're done adding nodes and if it were to migrate automatically, it could cause a lot of problems. Imagine streaming 100

Re: Best compaction strategy

2018-10-25 Thread Jonathan Haddad
To add to what Alex suggested, if you know what keys use what TTL you could store them in different tables, with different window settings. Jon On Fri, Oct 26, 2018 at 1:28 AM Alexander Dejanovski wrote: > Hi Raman, > > TWCS is the best compaction strategy for TTL data, even if you have >

Re: Cassandra running Multiple JVM's

2018-10-24 Thread Jonathan Haddad
Another issue you'll need to consider is how the JVM allocates resources towards GC, especially if you're using G1 with a pause time goal. Specifically, if you let it pick it's own numbers for ParallelGCThreads & ConcGCThreads they'll be based on the total number of CPUs, not the number you've

Re: TWCS: Repair create new buckets with old data

2018-10-24 Thread Jonathan Haddad
Hey Meg, a couple thoughts. > Set a table level TTL with TWCS, and stop setting it with inserts/updates (insert TTL overrides table level TTL). So, that your entire sstable expires at the same time, as opposed to each insert expiring at its own pace. So that for tombstone clean up, the system

Re: openjdk for cassandra production cluster

2018-10-10 Thread Jonathan Haddad
The warning should be removed (if it hasn’t already), it’s unnecessary at this point On Wed, Oct 10, 2018 at 7:41 AM Prachi Rath wrote: > HI users, > I have created a cassandra cluster with openjdk 1.8.0_181 > version.(cassandra 2.1.17) > started each node, cluster looks healthy,but in the log

Re: SNAPSHOT builds?

2018-09-29 Thread Jonathan Haddad
Hey James, you’ll have to build it. Java 11 is out but the build instructions still apply: http://thelastpickle.com/blog/2018/08/16/java11.html On Sat, Sep 29, 2018 at 7:01 AM James Carman wrote: > I am trying to find 4.x SNAPSHOT builds. Are they available anywhere > handy? I'm trying to

Re: Large partitions

2018-09-13 Thread Jonathan Haddad
It depends on a number of factors, such as compaction strategy and read patterns. I recommend sticking to the 100MB per partition limit (and I aim for significantly less than that). If you're doing time series with TWCS & TTL'ed data and small enough windows, and you're only querying for a small

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-09 Thread Jonathan Haddad
ation factor to use for their > token allocation logic, maybe they guess or take the highest or something. > Cassandra doesn’t - we require you to be explicit, but we could probably do > better here. > > > > On Sep 8, 2018, at 8:17 AM, Oleksandr Shulgin < > oleksandr.shul...@za

Re: Using CDC Feature to Stream C* to Kafka (Design Proposal)

2018-09-09 Thread Jonathan Haddad
I'll be honest, I'm having a hard time wrapping my head around an architecture where you use CDC to push data into Kafka. I've worked on plenty of systems that use Kafka as a means of communication, and one of the consumers is a process that stores data in Cassandra. That's pretty normal.

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread Jonathan Haddad
.@zalando.de> wrote: > On Sat, 8 Sep 2018, 14:47 Jonathan Haddad, wrote: > >> 256 tokens is a pretty terrible default setting especially post 3.0. I >> recommend folks use 4 tokens for new clusters, >> > > I wonder why not setting it to all the way down to 1 then? Wha

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread Jonathan Haddad
more why should i run that python command and > config allocate_tokens_for_keyspace? i only have one keyspace per cluster. > Im using Network replication strategy, and a rack-aware topology config. > > Sent using Zoho Mail <https://www.zoho.com/mail/> > > > On

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread Jonathan Haddad
256 tokens is a pretty terrible default setting especially post 3.0. I recommend folks use 4 tokens for new clusters, with some caveats. When you fire up a cluster, there's no way to make the initial tokens be distributed evenly, you'll get random ones. You'll want to set them explicitly using:

Re: JBOD disk failure - just say no

2018-08-22 Thread Jonathan Haddad
We recently helped a team deal with some JBOD issues, they can be quite painful, and the experience depends a bit on the C* version in use. We wrote a blog post about it (published today): http://thelastpickle.com/blog/2018/08/22/the-fine-print-when-using-multiple-data-directories.html Hope

Java 11 support in Cassandra 4.0 + Early Testing and Feedback

2018-08-16 Thread Jonathan Haddad
Hey folks, As we start to get ready to feature freeze trunk for 4.0, it's going to be important to get a lot of community feedback. This is going to be a big release for a number of reasons. * Virtual tables. Finally a nice way of querying for system metrics & status * Streaming optimizations

Re: Compression Tuning Tutorial

2018-08-09 Thread Jonathan Haddad
except lack of resources to do this? > > > Regards, > > Kyrill > -- > *From:* Eric Plowe > *Sent:* Wednesday, August 8, 2018 9:39:44 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Compression Tuning Tutorial > > Great post, Jonathan! T

Compression Tuning Tutorial

2018-08-08 Thread Jonathan Haddad
Hey folks, We've noticed a lot over the years that people create tables usually leaving the default compression parameters, and have spent a lot of time helping teams figure out the right settings for their cluster based on their workload. I finally managed to write some thoughts down along with

Re: TWCS Compaction backed up

2018-08-07 Thread Jonathan Haddad
What's your window size? When you say backed up, how are you measuring that? Are there pending tasks or do you just see more files than you expect? On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler wrote: > Hey guys, quick question: > > I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl,

Re: Bootstrap OOM issues with Cassandra 3.11.1

2018-08-07 Thread Jonathan Haddad
By default Cassandra is set to generate a heap dump on OOM. It can be a bit tricky to figure out what’s going on exactly but it’s the best evidence you can work with. On Tue, Aug 7, 2018 at 6:30 AM Laszlo Szabo wrote: > Hi, > > Thanks for the fast response! > > We are not using any materialized

Re: Apache Cassandra 3.11.3 Question

2018-08-04 Thread Jonathan Haddad
This strategy is a lot more work than just replacing nodes one at a time. For a large cluster it would be months of work instead of a couple days. On Sat, Aug 4, 2018 at 7:04 AM R1 J1 wrote: > Can a cluster having 3.11.0 node(s) accept a 3.11.3 node as a new node > for eventual migration and

Re: Secure data

2018-08-01 Thread Jonathan Haddad
ear: > https://www.instaclustr.com/securing-apache-cassandra-with-application-level-encryption/ > > We also use encrypted GP2 EBS pretty widely without issue. > > Cheers > Ben > > On Thu, 2 Aug 2018 at 05:38 Jonathan Haddad wrote: > >> You can also get full disk

Re: Secure data

2018-08-01 Thread Jonathan Haddad
You can also get full disk encryption with LUKS, which I've used before. On Wed, Aug 1, 2018 at 12:36 PM Jeff Jirsa wrote: > EBS encryption worked well on gp2 volumes (never tried it on any others) > > -- > Jeff Jirsa > > > On Aug 1, 2018, at 7:57 AM, Rahul Reddy wrote: > > Hello, > > Any one

Reaper 1.2 released

2018-07-24 Thread Jonathan Haddad
Hey folks, Just wanted to share with the list that after a bit of a long wait, we've released Reaper 1.2. We have a short blog post here outlining the new features: https://twitter.com/TheLastPickle/status/1021830663605870592 With each release we've worked on performance improvements and

Re: Timeout for only one keyspace in cluster

2018-07-23 Thread Jonathan Haddad
You don’t get this guarantee with counters. Do not use them for unique values. Use a UUID instead. On Mon, Jul 23, 2018 at 9:11 AM learner dba wrote: > James, > > Yes, counter is implemented due to valid reasons. We need this value > column to have unique values being used at the time of

Re: Incremental Backup Hardlinks

2018-07-19 Thread Jonathan Haddad
The hard links are created after the SSTables have finished writing. On Thu, Jul 19, 2018 at 9:51 AM David Payne wrote: > Hello Cassandra Experts and Committers, > > > > Hopefully this is just a dumb question, but without the skill set to read > the source code, I must ask. > > > > Consider

Re: Cassandra Client Program not Working with NettySSLOptions

2018-06-19 Thread Jonathan Haddad
Is the server configured to use encryption? On Tue, Jun 19, 2018 at 3:59 AM Jahar Tyagi wrote: > Hi, > > I referred to this link > https://docs.datastax.com/en/developer/java-driver/3.0/manual/ssl/ > to > implement a simple

Re: Compaction strategy for update heavy workload

2018-06-13 Thread Jonathan Haddad
I wouldn't use TWCS if there's updates, you're going to risk having data that's never deleted and really small sstables sticking around forever. If you use really large buckets, what's the point of TWCS? Honestly this is such a small workload you could easily use STCS or LCS and you'd likely

Re: Mongo DB vs Cassandra

2018-05-31 Thread Jonathan Haddad
I haven’t seen any query requirements, which is going to be the thing that makes Cassandra difficult. If you can’t define your queries beforehand, cassandra is a no go. If you just want to store data somewhere, and it’s just CSV, I’d go with a simple blob store like s3 and pick a DB later when

Re: Time Series schema performance

2018-05-29 Thread Jonathan Haddad
I wrote a post on this topic a while ago, might be worth reading over: http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html On Tue, May 29, 2018 at 8:02 AM Jeff Jirsa wrote: > There’s a third option which is doing bucketing by time instead of by hash, which tends

Re: cassandra update vs insert + delete

2018-05-27 Thread Jonathan Haddad
What is a “soft delete”? My 2 cents, if you want to update some information just update it. There’s no need to overthink it. Batches are good if they’re constrained to a single partition, not so hot otherwise. On Sun, May 27, 2018 at 8:19 AM Rahul Singh wrote: >

Re: Reading from big partitions

2018-05-19 Thread Jonathan Haddad
What disks are you using? How many sstables are you hitting? Did you try tracing the request? On Sat, May 19, 2018 at 8:43 PM onmstester onmstester wrote: > Hi, > Due to some unpredictable behavior in input data i end up with some > hundred partitions having more than 300MB

Re: Solve Busy pool at Cassandra side

2018-05-13 Thread Jonathan Haddad
This error comes from com.datastax.driver.core.HostConnectionPool#enqueue, which is the client side pool. Cassandra can handle more requests, the application needs to be fixed. As per the java docs: /** * Indicates that a connection pool has run out of available connections. * * This

Re: Switching to TWCS

2018-04-27 Thread Jonathan Haddad
TWCS uses the max timestamp in an sstable to determine what to compact together, it won't anti-compact your data. The goal is to minimize I/O. You'll have to wait for all your mixed-timestamp sstable data to TTL out before TWCS's windowing kicks in optimally.

Re: Adding new nodes to cluster to speedup pending compactions

2018-04-27 Thread Jonathan Haddad
Your compaction time won't improve immediately simply by adding nodes because the old data still needs to be cleaned up. What's your end goal? Why is having a spike in pending compaction tasks following a massive write an issue? Are you seeing a dip in performance, violating an SLA, or do you

Re: Repair of 5GB data vs. disk throughput does not make sense

2018-04-26 Thread Jonathan Haddad
I can't say for sure, because I haven't measured it, but I've seen a combination of readahead + large chunk size with compression cause serious issues with read amplification, although I'm not sure if or how it would apply here. Likely depends on the size of your partitions and the fragmentation

Re: Version Upgrade

2018-04-25 Thread Jonathan Haddad
There's no harm in running it during any upgrade, and I always recommend doing it just to be in the habit. My 2 cents. On Wed, Apr 25, 2018 at 3:39 PM Christophe Schmitz < christo...@instaclustr.com> wrote: > Hi Pranay, > > You only need to upgrade your SSTables when you perform a major

Re: Reading Cassandra's Blob from Apache Ignite

2018-04-25 Thread Jonathan Haddad
I think you’ll have better luck with the ignite list, as this looks like an ignite configuration problem. On Wed, Apr 25, 2018 at 3:09 AM wrote: > Dear Community, > > > > I'm trying to read the contents of Cassandra table from Ignite(acting as > cache). The table is given

Re: 答复: Time serial column family design

2018-04-17 Thread Jonathan Haddad
To add to what Nate suggested, we have an entire blog post on scaling time series data models: http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html Jon On Tue, Apr 17, 2018 at 7:39 PM Nate McCall wrote: > I disagree. Create date as a

Re: Cassandra datastax cerrification

2018-04-14 Thread Jonathan Haddad
The original question was about prepping. I think that might be a question best suited for datastax, since you’re paying them for the cert. On Sat, Apr 14, 2018 at 9:02 AM Ben Bromhead wrote: > Certification is only as good as the organizations that recognize it. > Identify

Re: Latest version and Features

2018-04-11 Thread Jonathan Haddad
you this link instead : >>>>> https://github.com/apache/cassandra/blob/trunk/NEWS.txt >>>>> >>>>> You'll find everything you need IMHO >>>>> >>>>> On 11 April 2018 at 17:05, Abdul Patel <abd786...@gmail.com> wrote: &

Re: JVM Tuning post

2018-04-11 Thread Jonathan Haddad
Re G1GC in Java 9, yes it's the default, but we explicitly specify the collector when we start Cassandra. Regarding load testing, some folks like cassandra-stress, but personally I think second to production itself, there's nothing better than an environment running the full applications stack

Re: Latest version and Features

2018-04-11 Thread Jonathan Haddad
Move to the latest 3.0, or if you're feeling a little more adventurous, 3.11.2. 4.0 discussion is happening now, nothing is decided. On Wed, Apr 11, 2018 at 7:35 AM Abdul Patel wrote: > Hi All, > > Our company is planning for upgrading cassandra to maitain the audit >

Re: Is Cassandra used in Medical industry?

2018-03-29 Thread Jonathan Haddad
If you require a full audit trail then you'll need to do this in your data model. I recommend looking to event sourcing, which is a way of tracking all changes to an entity over its lifetime. https://martinfowler.com/eaaDev/EventSourcing.html Instead of thinking of data as global mutable state,

Re: Is Cassandra used in Medical industry?

2018-03-29 Thread Jonathan Haddad
I haven't use Vormetric, but have worked with a couple teams doing disk encryption using LUKS: https://gitlab.com/cryptsetup/cryptsetup/blob/master/README.md I haven't read through that FDA guideline, and tbh I'm not going to - if there's a specific question you have it would be better to ask it

Re: Can "data_file_directories" make use of multiple disks?

2018-03-27 Thread Jonathan Haddad
In Cassandra 3.2 and later, data is partitioned by token range, which should give you even distribution of data. If you're going to go into 3.x, please use the latest 3.11, which at this time is 3.11.2. On Tue, Mar 27, 2018 at 8:05 AM Venkata Hari Krishna Nukala <

Re: Update to C* 3.0.14 from 3.0.10

2018-03-23 Thread Jonathan Haddad
3.0.16 is the latest, I recommend going all the way up. About a hundred bug fixes: https://github.com/apache/cassandra/blob/cassandra-3.0/CHANGES.txt Jon On Fri, Mar 23, 2018 at 2:22 PM Dmitry Saprykin wrote: > Hi, > > I successfully used 3.0.14 more than a year in

Re: Using Spark to delete from Transactional Cluster

2018-03-23 Thread Jonathan Haddad
I'm confused as to what the difference between deleting with prepared statements and deleting through spark is? To the best of my knowledge either way it's the same thing - normal deletion with tombstones replicated. Is it that you're doing deletes in the analytics DC instead of your real time

Re: replace dead node vs remove node

2018-03-22 Thread Jonathan Haddad
Ah sorry - I misread the original post - for some reason I had it in my head the question was about bootstrap. Carry on. On Thu, Mar 22, 2018 at 8:35 PM Jonathan Haddad <j...@jonhaddad.com> wrote: > Under normal circumstances this is not true. > &

Re: replace dead node vs remove node

2018-03-22 Thread Jonathan Haddad
Under normal circumstances this is not true. Take a look at org.apache.cassandra.service.StorageProxy#performWrite, it grabs both the natural endpoints and the pending endpoints (new nodes). They're eventually passed through to

Re: Fast Writes to Cassandra Failing Through Python Script

2018-03-15 Thread Jonathan Haddad
Generally speaking, you don't need to. I almost never do. I've only set it in situations where I've had a large number of tables and I want to avoid a lot of flushing when commit log segments are removed. Setting it to 128 milliseconds means it's flushing 8 times per second, which gives no

Re: What versions should the documentation support now?

2018-03-13 Thread Jonathan Haddad
Yes, I agree, we should host versioned docs. I don't think anyone is against it, it's a matter of someone having the time to do it. On Tue, Mar 13, 2018 at 6:14 PM kurt greaves wrote: > I’ve never heard of anyone shipping docs for multiple versions, I don’t >> know why

Re: What versions should the documentation support now?

2018-03-12 Thread Jonathan Haddad
; > > Kenneth Brotman > > > > *From:* Jonathan Haddad [mailto:j...@jonhaddad.com] > *Sent:* Monday, March 12, 2018 8:40 AM > > > *To:* user@cassandra.apache.org > *Subject:* Re: What versions should the documentation support now? > > > > The docs are in tree, m

Re: What versions should the documentation support now?

2018-03-12 Thread Jonathan Haddad
The docs are in tree, meaning they are versioned, and should be written for the version they correspond to. Trunk docs should reflect the current state of trunk, and shouldn’t have caveats for other versions. On Mon, Mar 12, 2018 at 8:15 AM Kenneth Brotman wrote: >

Re: Jon Haddad on Diagnosing Performance Problems in Production

2018-02-27 Thread Jonathan Haddad
There isn't a ton from that talk I'd consider "wrong" at this point, but some of it is a little stale. I always start off looking at system metrics. For a very thorough discussion on the matter check out Brendan Gregg's USE [1] method. I did a blog post on my own about the talk [2] that has

Re: How to Parse raw CQL text?

2018-02-25 Thread Jonathan Haddad
I had to do something similar recently. Take a look at org.apache.cassandra.cql3.QueryProcessor.parseStatement(). I've got some sample code here [1] as well as a blog post [2] that explains how to access the private variables, since there's no access provided. It wasn't really designed to be

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jonathan Haddad
> wrote: > > > > Hi jonathan > > > > Thank you for the answer. Do you know where to look to understand why this > works. As i understood all the node then will chose ramdoms tokens. How can > i assure the correctness of the ring? > > > > So as you said.

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jonathan Haddad
If it's a new cluster, there's no need to disable auto_bootstrap. That setting prevents the first node in the second DC from being a replica for all the data in the first DC. If there's no data in the first DC, you can skip a couple steps and just leave it on. Leave it on, and enjoy your

Re: Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-21 Thread Jonathan Haddad
The easiest way to do this is replacing one node at a time by using rsync. I don't know why it has to be more complicated than copying data to a new machine and replacing it in the cluster. Bringing up a new DC with snapshots is going to be a nightmare in comparison. On Wed, Feb 21, 2018 at

Re: LWT broken?

2018-02-09 Thread Jonathan Haddad
If you want consistent reads you have to use the CL that enforces it. There’s no way around it. On Fri, Feb 9, 2018 at 2:35 PM Mahdi Ben Hamida wrote: > In this case, we only write using CAS (code guarantees that). We also > never update, just insert if not exist. Once a hash

Re: GDPR, Right to Be Forgotten, and Cassandra

2018-02-09 Thread Jonathan Haddad
That might be fine for a one off but is totally impractical at scale or when using TWCS. On Fri, Feb 9, 2018 at 8:39 AM DuyHai Doan wrote: > Or use the new user-defined compaction option recently introduced, > provided you can determine over which SSTables a partition is

Re: index_interval

2018-02-03 Thread Jonathan Haddad
I would also optimize for your worst case, which is hitting zero caches. If you're using the default settings when creating a table, you're going to get compression settings that are terrible for reads. If you've got memory to spare, I suggest changing your chunk_length_in_kb to 4 and disabling

Re: Old tombstones not being cleaned up

2018-02-01 Thread Jonathan Haddad
Changing the defaul TTL doesn’t change the TTL on the existing data, only new data. It’s only set if you don’t supply one yourself. On Wed, Jan 31, 2018 at 11:35 PM Bo Finnerup Madsen wrote: > Hi, > > We are running a small 9 node Cassandra v2.1.17 cluster. The cluster >

Re: Problem adding a new node to a cluster

2017-12-18 Thread Jonathan Haddad
Definitely upgrade to 3.11.1. On Sun, Dec 17, 2017 at 8:54 PM Pradeep Chhetri wrote: > Hello Kurt, > > I realized it was because of RAM shortage which caused the issue. I bumped > up the memory of the machine and node bootstrap started but this time i hit > this bug of

Re: Stress test cassandr

2017-11-26 Thread Jonathan Haddad
Have you read through the docs for stress? You can have it use your own queries and data model. http://cassandra.apache.org/doc/latest/tools/cassandra_stress.html On Sun, Nov 26, 2017 at 1:02 AM Akshit Jain wrote: > Hi, > What is the best way to stress test the

Re: Full repair use case

2017-11-21 Thread Jonathan Haddad
I wouldn’t recommend using incremental repair at all at this time due to some bugs that can cause massive overstreaming. Our advice at TLP is to do subrange repair, and we maintain Reaper to help with that: http://cassandra-reaper.io Jon On Wed, Nov 22, 2017 at 2:18 AM Akshit Jain

Re: Reaper 1.0

2017-11-17 Thread Jonathan Haddad
It should work with DSE, but we don’t explicitly test it. Mind testing it and posting your results? If you could include the DSE version it would be great. On Thu, Nov 16, 2017 at 11:57 PM Anshu Vajpayee wrote: > Thanks John for your efforts and nicley putting it on

Re: Node Failure Scenario

2017-11-14 Thread Jonathan Haddad
Anthony’s suggestions using replace_address_first_boot lets you avoid that requirement, and it’s specifically why it was added in 2.2. On Tue, Nov 14, 2017 at 1:02 AM Anshu Vajpayee wrote: > ​Thanks guys , > > I thikn better to pass replace_address on command line

Re: Alter table gc_grace_seconds

2017-10-01 Thread Jonathan Haddad
The TTL is applied to the cells on insert. Changing it doesn't change the TTL on data that was inserted previously. On Sun, Oct 1, 2017 at 6:23 AM Gábor Auth wrote: > Hi, > > The `alter table number_item with gc_grace_seconds = 3600;` is sets the > grace seconds of

Re: Help in c* Data modelling

2017-07-23 Thread Jonathan Haddad
Using a different table to answer each query is the correct answer here assuming there's a significant amount of data. If you don't have that much data, maybe you should consider using a database like Postgres which gives you query flexibility instead of horizontal scalability. On Sun, Jul 23,

Re: write time for nulls is not consistent

2017-07-18 Thread Jonathan Haddad
Kainth <ni...@bamlabs.com> wrote: > Jonathan, > > Please notice last rows with partition key values (w,v and t). they were > inserted same way and has write time values > > On Jul 18, 2017, at 2:22 PM, Jonathan Haddad <j...@jonhaddad.com> wrote: > > This l

  1   2   3   4   5   >