Re: JBOD disk failure - just say no

2018-08-20 Thread James Briggs
Cassandra JBOD has a bunch of issues, so I don't recommend it for production:
1) disks fill up with load (data) unevenly, meaning you can run out on a disk 
while some are half-full2) one bad disk can take out the whole node3) instead 
of a small failure probability on an LVM/RAID volume, with JBOD you end up near 
100% chance of failure after 3 years or so.4) generally you will not have 
enough warning of a looming failure with JBOD compared to LVM/RAID. 
(Somecompanies take a week or two to replace a failed disk.)
JBOD is easy to setup, but hard to manage. Thanks, James.


  From: kurt greaves 
 To: User  
 Sent: Friday, August 17, 2018 5:42 AM
 Subject: Re: JBOD disk failure
   
As far as I'm aware, yes. I recall hearing someone mention tying system tables 
to a particular disk but at the moment that doesn't exist.
On Fri., 17 Aug. 2018, 01:04 Eric Evans,  wrote:

On Wed, Aug 15, 2018 at 3:23 AM kurt greaves  wrote:
> Yep. It might require a full node replace depending on what data is lost from 
> the system tables. In some cases you might be able to recover from partially 
> lost system info, but it's not a sure thing.

Ugh, does it really just boil down to what part of `system` happens to
be on the disk in question?  In my mind, that makes the only sane
operational procedure for a failed disk to be: "replace the entire
node".  IOW, I don't think we can realistically claim you can survive
a failed a JBOD device if it relies on happenstance.

> On Wed., 15 Aug. 2018, 17:55 Christian Lorenz,  > wrote:
>>
>> Thank you for the answers. We are using the current version 3.11.3 So this 
>> one includes CASSANDRA-6696.
>>
>> So if I get this right, losing system tables will need a full node rebuild. 
>> Otherwise repair will get the node consistent again.
>
> [ ... ]

-- 
Eric Evans
john.eric.ev...@gmail.com

-- -- -
To unsubscribe, e-mail: user-unsubscribe@cassandra. apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




   

Re: Configuration parameter to reject incremental repair?

2018-08-20 Thread kurt greaves
Yeah I meant 2.2. Keep telling myself it was 3.0 for some reason.

On 20 August 2018 at 19:29, Oleksandr Shulgin 
wrote:

> On Mon, Aug 13, 2018 at 1:31 PM kurt greaves  wrote:
>
>> No flag currently exists. Probably a good idea considering the serious
>> issues with incremental repairs since forever, and the change of defaults
>> since 3.0.
>>
>
> Hi Kurt,
>
> Did you mean since 2.2 (when incremental became the default one)?  Or was
> there more to it that I'm not aware of?
>
> Thanks,
> --
> Alex
>
>


JMX for row cache churn

2018-08-20 Thread John Sumsion
Is there a JMX property somewhere that I could monitor to see how old the 
oldest row cache item is?


I want to see how much churn there is.


Thanks in advance,

John...


Re: Incremental repair

2018-08-20 Thread Alexander Dejanovski
Hi Pratchi,

Incremental has been the default since C* 2.2.

You can run a full repair by adding the "--full" flag to your nodetool
command.

Cheers,


Le lun. 20 août 2018 à 19:50, Prachi Rath  a écrit :

> Hi Community,
>
> I am currently creating a new cluster with cassandra 3.11.2 ,while
> enabling repair noticed that incremental repair is true in logfile.
>
>
> (parallelism: parallel, primary range: true, incremental: true, job
> threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges:
> 20, pull repair: false)
>
> i was running repair by -pr option only.
>
> Question:Is incremental repair is the default repair for cassandra 3.11.2
> version.
>
> Thanks,
> Prachi
>
>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Incremental repair

2018-08-20 Thread Prachi Rath
Hi Community,

I am currently creating a new cluster with cassandra 3.11.2 ,while enabling
repair noticed that incremental repair is true in logfile.


(parallelism: parallel, primary range: true, incremental: true, job
threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges:
20, pull repair: false)

i was running repair by -pr option only.

Question:Is incremental repair is the default repair for cassandra 3.11.2
version.

Thanks,
Prachi


Re: Extending Cassandra on AWS from single Region to Multi-Region

2018-08-20 Thread Oleksandr Shulgin
On Thu, Aug 9, 2018 at 3:46 AM srinivasarao daruna 
wrote:

> Hi All,
>
> We have built Cassandra on AWS EC2 instances. Initially when creating
> cluster we have not considered multi-region deployment and we have used AWS
> EC2Snitch.
>
> We have used EBS Volumes to save our data and each of those disks were
> filled around 350G.
> We want to extend it to Multi Region and wanted to know the better
> approach and recommendations to achieve this process.
>
> I agree that we have made a mistake by not using EC2MultiRegionSnitch, but
> its past now and if anyone faced or implemented similar thing i would like
> to get some guidance.
>
> Any help would be very much appreciated.
>

Hello,

As we did this successfully in the past, here are some notes from the field:

- configure the client applications to use address translation specific to
EC2 setup:
https://docs.datastax.com/en/developer/java-driver/3.3/manual/address_resolution/#ec2-multi-region

- either specify the 'datacenter' name the client should consider as a
local in the DCAwareRoundRobinPolicy() or provide private IP addresses of
the local DC as contact points.  This should ensure that the clients don't
try to connect to the new DC which doesn't have the data yet.

- review the consistency levels the client uses: use LOCAL_ONE and
LOCAL_QUORUM instead of ONE/QUORUM for reads and writes, use EACH_QUORUM
for writes when you want to ensure stronger consistency cross-region.

- switching from plain EC2Snitch to EC2MultiRegionSnitch will change node's
broadcast address to its public IP.  Make sure that other nodes (in the
same region and remote region) can connect on the public IP.

Hope this helps,
--
Alex


Re: Adding new datacenter to the cluster

2018-08-20 Thread Oleksandr Shulgin
On Mon, Aug 13, 2018 at 3:50 PM Vitali Dyachuk  wrote:

> Hello,
> I'm going to follow this documentation to add a new datacenter to the C*
> cluster
>
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
>
> The main step is to run nodetool rebuild which will sync data to the new
> datacenter,
> this will load cluster badly since the main keyspace size is 2TB.
> 1) What are the best practicies to add a new datacenter with a lot of data?
>

Hi,

If you fear overloading the source DC for rebuild, you can try starting
rebuild one node at a time on the target DC.  Better options exist for
throttling, see below.


> 2) How is it possible to stop rebuild?
>

You can stop rebuild on a single node by restarting Cassandra server
process.  Rebuild can be resumed by running `nodetool rebuild ...` again.


> 3) What are the throttling possibilities
>

nodetool setstreamingthroughput

Cheers,
--
Alex


Re: Configuration parameter to reject incremental repair?

2018-08-20 Thread Oleksandr Shulgin
On Mon, Aug 13, 2018 at 1:31 PM kurt greaves  wrote:

> No flag currently exists. Probably a good idea considering the serious
> issues with incremental repairs since forever, and the change of defaults
> since 3.0.
>

Hi Kurt,

Did you mean since 2.2 (when incremental became the default one)?  Or was
there more to it that I'm not aware of?

Thanks,
--
Alex


RE: Repair daily refreshed table

2018-08-20 Thread Per Otterström
Hi Maxim.

Assuming all your update operations are successful and that you only delete 
data by TTL in that table, then you shouldn’t have to do repairs on it.

You may also consider to lower the gc_grace_seconds value on that table, but 
you should be aware of how this impacts hints and logged batches: 
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableGc_grace_seconds

/pelle

From: Maxim Parkachov 
Sent: den 20 augusti 2018 08:29
To: user@cassandra.apache.org
Subject: Re: Repair daily refreshed table

Hi Raul,

I cannot afford delete and then load as this will create downtime for the 
record, that's why I'm upserting with TTL today()+7days as I mentioted in my 
original question. And at the moment I don't have an issue either with loading 
nor with access times. My question is should I repair such table or not and if 
yes before load or after (or it doesn't matter) ?

Thanks,
Maxim.

On Sun, Aug 19, 2018 at 8:52 AM Rahul Singh 
mailto:rahul.xavier.si...@gmail.com>> wrote:
If you wanted to be certain that all replicas were acknowledging receipt of the 
data, then you could use ALL or EACH_QUORUM ( if you have multiple DCs) but you 
must really want high consistency if you do that.

You should avoid consciously creating tombstones if possible — it ends up 
making reads slower because they need to be accounted for until they are 
compacted / garbage collected out.

Tombstones are created when data is either deleted, or nulled. When marking 
data with a TTL , the actual delete is not done until after the TTL has expired.

When you say you are overwriting, are you deleting and then loading? That’s the 
only way you should see tombstones — or maybe you are setting nulls?

Rahul
On Aug 18, 2018, 11:16 PM -0700, Maxim Parkachov 
mailto:lazy.gop...@gmail.com>>, wrote:
Hi Rahul,

I'm already using LOCAL_QUORUM in batch process and it runs every day. As far 
as I understand, because I'm overwriting whole table with new TTL, process 
creates tons of thumbstones and I'm more concerned with them.

Regards,
Maxim.
On Sun, Aug 19, 2018 at 3:02 AM Rahul Singh 
mailto:rahul.xavier.si...@gmail.com>> wrote:
Are you loading using a batch process? What’s the frequency of the data Ingest 
and does it have to very fast. If not too frequent and can be a little slower, 
you may consider a higher consistency to ensure data is on replicas.

Rahul
On Aug 18, 2018, 2:29 AM -0700, Maxim Parkachov 
mailto:lazy.gop...@gmail.com>>, wrote:
Hi community,

I'm currently puzzled with following challenge. I have a CF with 7 days TTL on 
all rows. Daily there is a process which loads actual data with +7 days TTL. 
Thus records which are not present in last 7 days of load expired. Amount of 
these expired records are very small < 1%. I have daily repair process, which 
take considerable amount of time and resources, and snapshot after that. 
Obviously I'm concerned only with the last loaded data. Basically, my question: 
should I run repair before load, after load or maybe I don't need to repair 
such table at all ?

Regards,
Maxim.


Re: Repair daily refreshed table

2018-08-20 Thread Maxim Parkachov
Hi Raul,

I cannot afford delete and then load as this will create downtime for the
record, that's why I'm upserting with TTL today()+7days as I mentioted in
my original question. And at the moment I don't have an issue either with
loading nor with access times. My question is should I repair such table or
not and if yes before load or after (or it doesn't matter) ?

Thanks,
Maxim.

On Sun, Aug 19, 2018 at 8:52 AM Rahul Singh 
wrote:

> If you wanted to be certain that all replicas were acknowledging receipt
> of the data, then you could use ALL or EACH_QUORUM ( if you have multiple
> DCs) but you must really want high consistency if you do that.
>
> You should avoid consciously creating tombstones if possible — it ends up
> making reads slower because they need to be accounted for until they are
> compacted / garbage collected out.
>
> Tombstones are created when data is either deleted, or nulled. When
> marking data with a TTL , the actual delete is not done until after the TTL
> has expired.
>
> When you say you are overwriting, are you deleting and then loading?
> That’s the only way you should see tombstones — or maybe you are setting
> nulls?
>
> Rahul
> On Aug 18, 2018, 11:16 PM -0700, Maxim Parkachov ,
> wrote:
>
> Hi Rahul,
>
> I'm already using LOCAL_QUORUM in batch process and it runs every day. As
> far as I understand, because I'm overwriting whole table with new TTL,
> process creates tons of thumbstones and I'm more concerned with them.
>
> Regards,
> Maxim.
>
> On Sun, Aug 19, 2018 at 3:02 AM Rahul Singh 
> wrote:
>
>> Are you loading using a batch process? What’s the frequency of the data
>> Ingest and does it have to very fast. If not too frequent and can be a
>> little slower, you may consider a higher consistency to ensure data is on
>> replicas.
>>
>> Rahul
>> On Aug 18, 2018, 2:29 AM -0700, Maxim Parkachov ,
>> wrote:
>>
>> Hi community,
>>
>> I'm currently puzzled with following challenge. I have a CF with 7 days
>> TTL on all rows. Daily there is a process which loads actual data with +7
>> days TTL. Thus records which are not present in last 7 days of load
>> expired. Amount of these expired records are very small < 1%. I have daily
>> repair process, which take considerable amount of time and resources, and
>> snapshot after that. Obviously I'm concerned only with the last loaded
>> data. Basically, my question: should I run repair before load, after load
>> or maybe I don't need to repair such table at all ?
>>
>> Regards,
>> Maxim.
>>
>>