Re: JBOD disk failure - just say no
Cassandra JBOD has a bunch of issues, so I don't recommend it for production: 1) disks fill up with load (data) unevenly, meaning you can run out on a disk while some are half-full2) one bad disk can take out the whole node3) instead of a small failure probability on an LVM/RAID volume, with JBOD you end up near 100% chance of failure after 3 years or so.4) generally you will not have enough warning of a looming failure with JBOD compared to LVM/RAID. (Somecompanies take a week or two to replace a failed disk.) JBOD is easy to setup, but hard to manage. Thanks, James. From: kurt greaves To: User Sent: Friday, August 17, 2018 5:42 AM Subject: Re: JBOD disk failure As far as I'm aware, yes. I recall hearing someone mention tying system tables to a particular disk but at the moment that doesn't exist. On Fri., 17 Aug. 2018, 01:04 Eric Evans, wrote: On Wed, Aug 15, 2018 at 3:23 AM kurt greaves wrote: > Yep. It might require a full node replace depending on what data is lost from > the system tables. In some cases you might be able to recover from partially > lost system info, but it's not a sure thing. Ugh, does it really just boil down to what part of `system` happens to be on the disk in question? In my mind, that makes the only sane operational procedure for a failed disk to be: "replace the entire node". IOW, I don't think we can realistically claim you can survive a failed a JBOD device if it relies on happenstance. > On Wed., 15 Aug. 2018, 17:55 Christian Lorenz, > wrote: >> >> Thank you for the answers. We are using the current version 3.11.3 So this >> one includes CASSANDRA-6696. >> >> So if I get this right, losing system tables will need a full node rebuild. >> Otherwise repair will get the node consistent again. > > [ ... ] -- Eric Evans john.eric.ev...@gmail.com -- -- - To unsubscribe, e-mail: user-unsubscribe@cassandra. apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Configuration parameter to reject incremental repair?
Yeah I meant 2.2. Keep telling myself it was 3.0 for some reason. On 20 August 2018 at 19:29, Oleksandr Shulgin wrote: > On Mon, Aug 13, 2018 at 1:31 PM kurt greaves wrote: > >> No flag currently exists. Probably a good idea considering the serious >> issues with incremental repairs since forever, and the change of defaults >> since 3.0. >> > > Hi Kurt, > > Did you mean since 2.2 (when incremental became the default one)? Or was > there more to it that I'm not aware of? > > Thanks, > -- > Alex > >
JMX for row cache churn
Is there a JMX property somewhere that I could monitor to see how old the oldest row cache item is? I want to see how much churn there is. Thanks in advance, John...
Re: Incremental repair
Hi Pratchi, Incremental has been the default since C* 2.2. You can run a full repair by adding the "--full" flag to your nodetool command. Cheers, Le lun. 20 août 2018 à 19:50, Prachi Rath a écrit : > Hi Community, > > I am currently creating a new cluster with cassandra 3.11.2 ,while > enabling repair noticed that incremental repair is true in logfile. > > > (parallelism: parallel, primary range: true, incremental: true, job > threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: > 20, pull repair: false) > > i was running repair by -pr option only. > > Question:Is incremental repair is the default repair for cassandra 3.11.2 > version. > > Thanks, > Prachi > > > -- - Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Incremental repair
Hi Community, I am currently creating a new cluster with cassandra 3.11.2 ,while enabling repair noticed that incremental repair is true in logfile. (parallelism: parallel, primary range: true, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 20, pull repair: false) i was running repair by -pr option only. Question:Is incremental repair is the default repair for cassandra 3.11.2 version. Thanks, Prachi
Re: Extending Cassandra on AWS from single Region to Multi-Region
On Thu, Aug 9, 2018 at 3:46 AM srinivasarao daruna wrote: > Hi All, > > We have built Cassandra on AWS EC2 instances. Initially when creating > cluster we have not considered multi-region deployment and we have used AWS > EC2Snitch. > > We have used EBS Volumes to save our data and each of those disks were > filled around 350G. > We want to extend it to Multi Region and wanted to know the better > approach and recommendations to achieve this process. > > I agree that we have made a mistake by not using EC2MultiRegionSnitch, but > its past now and if anyone faced or implemented similar thing i would like > to get some guidance. > > Any help would be very much appreciated. > Hello, As we did this successfully in the past, here are some notes from the field: - configure the client applications to use address translation specific to EC2 setup: https://docs.datastax.com/en/developer/java-driver/3.3/manual/address_resolution/#ec2-multi-region - either specify the 'datacenter' name the client should consider as a local in the DCAwareRoundRobinPolicy() or provide private IP addresses of the local DC as contact points. This should ensure that the clients don't try to connect to the new DC which doesn't have the data yet. - review the consistency levels the client uses: use LOCAL_ONE and LOCAL_QUORUM instead of ONE/QUORUM for reads and writes, use EACH_QUORUM for writes when you want to ensure stronger consistency cross-region. - switching from plain EC2Snitch to EC2MultiRegionSnitch will change node's broadcast address to its public IP. Make sure that other nodes (in the same region and remote region) can connect on the public IP. Hope this helps, -- Alex
Re: Adding new datacenter to the cluster
On Mon, Aug 13, 2018 at 3:50 PM Vitali Dyachuk wrote: > Hello, > I'm going to follow this documentation to add a new datacenter to the C* > cluster > > https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html > > The main step is to run nodetool rebuild which will sync data to the new > datacenter, > this will load cluster badly since the main keyspace size is 2TB. > 1) What are the best practicies to add a new datacenter with a lot of data? > Hi, If you fear overloading the source DC for rebuild, you can try starting rebuild one node at a time on the target DC. Better options exist for throttling, see below. > 2) How is it possible to stop rebuild? > You can stop rebuild on a single node by restarting Cassandra server process. Rebuild can be resumed by running `nodetool rebuild ...` again. > 3) What are the throttling possibilities > nodetool setstreamingthroughput Cheers, -- Alex
Re: Configuration parameter to reject incremental repair?
On Mon, Aug 13, 2018 at 1:31 PM kurt greaves wrote: > No flag currently exists. Probably a good idea considering the serious > issues with incremental repairs since forever, and the change of defaults > since 3.0. > Hi Kurt, Did you mean since 2.2 (when incremental became the default one)? Or was there more to it that I'm not aware of? Thanks, -- Alex
RE: Repair daily refreshed table
Hi Maxim. Assuming all your update operations are successful and that you only delete data by TTL in that table, then you shouldn’t have to do repairs on it. You may also consider to lower the gc_grace_seconds value on that table, but you should be aware of how this impacts hints and logged batches: https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableGc_grace_seconds /pelle From: Maxim Parkachov Sent: den 20 augusti 2018 08:29 To: user@cassandra.apache.org Subject: Re: Repair daily refreshed table Hi Raul, I cannot afford delete and then load as this will create downtime for the record, that's why I'm upserting with TTL today()+7days as I mentioted in my original question. And at the moment I don't have an issue either with loading nor with access times. My question is should I repair such table or not and if yes before load or after (or it doesn't matter) ? Thanks, Maxim. On Sun, Aug 19, 2018 at 8:52 AM Rahul Singh mailto:rahul.xavier.si...@gmail.com>> wrote: If you wanted to be certain that all replicas were acknowledging receipt of the data, then you could use ALL or EACH_QUORUM ( if you have multiple DCs) but you must really want high consistency if you do that. You should avoid consciously creating tombstones if possible — it ends up making reads slower because they need to be accounted for until they are compacted / garbage collected out. Tombstones are created when data is either deleted, or nulled. When marking data with a TTL , the actual delete is not done until after the TTL has expired. When you say you are overwriting, are you deleting and then loading? That’s the only way you should see tombstones — or maybe you are setting nulls? Rahul On Aug 18, 2018, 11:16 PM -0700, Maxim Parkachov mailto:lazy.gop...@gmail.com>>, wrote: Hi Rahul, I'm already using LOCAL_QUORUM in batch process and it runs every day. As far as I understand, because I'm overwriting whole table with new TTL, process creates tons of thumbstones and I'm more concerned with them. Regards, Maxim. On Sun, Aug 19, 2018 at 3:02 AM Rahul Singh mailto:rahul.xavier.si...@gmail.com>> wrote: Are you loading using a batch process? What’s the frequency of the data Ingest and does it have to very fast. If not too frequent and can be a little slower, you may consider a higher consistency to ensure data is on replicas. Rahul On Aug 18, 2018, 2:29 AM -0700, Maxim Parkachov mailto:lazy.gop...@gmail.com>>, wrote: Hi community, I'm currently puzzled with following challenge. I have a CF with 7 days TTL on all rows. Daily there is a process which loads actual data with +7 days TTL. Thus records which are not present in last 7 days of load expired. Amount of these expired records are very small < 1%. I have daily repair process, which take considerable amount of time and resources, and snapshot after that. Obviously I'm concerned only with the last loaded data. Basically, my question: should I run repair before load, after load or maybe I don't need to repair such table at all ? Regards, Maxim.
Re: Repair daily refreshed table
Hi Raul, I cannot afford delete and then load as this will create downtime for the record, that's why I'm upserting with TTL today()+7days as I mentioted in my original question. And at the moment I don't have an issue either with loading nor with access times. My question is should I repair such table or not and if yes before load or after (or it doesn't matter) ? Thanks, Maxim. On Sun, Aug 19, 2018 at 8:52 AM Rahul Singh wrote: > If you wanted to be certain that all replicas were acknowledging receipt > of the data, then you could use ALL or EACH_QUORUM ( if you have multiple > DCs) but you must really want high consistency if you do that. > > You should avoid consciously creating tombstones if possible — it ends up > making reads slower because they need to be accounted for until they are > compacted / garbage collected out. > > Tombstones are created when data is either deleted, or nulled. When > marking data with a TTL , the actual delete is not done until after the TTL > has expired. > > When you say you are overwriting, are you deleting and then loading? > That’s the only way you should see tombstones — or maybe you are setting > nulls? > > Rahul > On Aug 18, 2018, 11:16 PM -0700, Maxim Parkachov , > wrote: > > Hi Rahul, > > I'm already using LOCAL_QUORUM in batch process and it runs every day. As > far as I understand, because I'm overwriting whole table with new TTL, > process creates tons of thumbstones and I'm more concerned with them. > > Regards, > Maxim. > > On Sun, Aug 19, 2018 at 3:02 AM Rahul Singh > wrote: > >> Are you loading using a batch process? What’s the frequency of the data >> Ingest and does it have to very fast. If not too frequent and can be a >> little slower, you may consider a higher consistency to ensure data is on >> replicas. >> >> Rahul >> On Aug 18, 2018, 2:29 AM -0700, Maxim Parkachov , >> wrote: >> >> Hi community, >> >> I'm currently puzzled with following challenge. I have a CF with 7 days >> TTL on all rows. Daily there is a process which loads actual data with +7 >> days TTL. Thus records which are not present in last 7 days of load >> expired. Amount of these expired records are very small < 1%. I have daily >> repair process, which take considerable amount of time and resources, and >> snapshot after that. Obviously I'm concerned only with the last loaded >> data. Basically, my question: should I run repair before load, after load >> or maybe I don't need to repair such table at all ? >> >> Regards, >> Maxim. >> >>