Re: Cassandra stable version for production

2016-07-20 Thread Jonathan Haddad
If I were starting a new project today, I'd go with 3.0.  It's gotten over
half a year of bug fixes.

I personally would have a hard time putting a tick tock release into
production, as you're either getting new features or putting a version in
which won't receive any further bug fixes, so unless you're ready to dive
into code or backport patches (or upgrade every month thereafter to get
bugfixes in addition to new features) 3.0 is probably your best bet.

Julien Anguenot talks a little bit about migrating his cluster to 3.0 here:
http://planetcassandra.org/blog/this-week-in-cassandra-3-0-in-the-wild-5132016/

On Wed, Jul 20, 2016 at 4:33 PM Farzad Panahi 
wrote:

> Hi,
>
> I am new to Cassandra and a bit confused about how versioning works here.
>
> I have recently started working on Cassandra clusters. I want to create a
> Cassandra cluster and make it ready for production but I am not sure which
> version is the right one. Some people mention that the most stable version
> is 2.2 but when I look at Cassandra's download page it says that ver 2.2
> will be supported till November 2016 which does not look like a good
> candidate for a new cluster.
>
> So is the tick-tock release 3.7 stable enough for production or should I
> go with 3.0 or 2.2? Which one is the most robust and reliable one for
> production?
>
> I appreciate if the community can share their thoughts.
>
> Cheers
>
> Farzad
>


Cassandra stable version for production

2016-07-20 Thread Farzad Panahi
Hi,

I am new to Cassandra and a bit confused about how versioning works here.

I have recently started working on Cassandra clusters. I want to create a
Cassandra cluster and make it ready for production but I am not sure which
version is the right one. Some people mention that the most stable version
is 2.2 but when I look at Cassandra's download page it says that ver 2.2
will be supported till November 2016 which does not look like a good
candidate for a new cluster.

So is the tick-tock release 3.7 stable enough for production or should I go
with 3.0 or 2.2? Which one is the most robust and reliable one for
production?

I appreciate if the community can share their thoughts.

Cheers

Farzad


Re: Re : Recommended procedure for enabling SSL on a live production cluster

2016-07-20 Thread Nate McCall
If you migrate to the latest 2.1 first, you can make this a non-issue as
2.1.12 and above support simultaneous SSL and plain on the same port for
exactly this use case:
https://issues.apache.org/jira/browse/CASSANDRA-10559

On Thu, Jul 21, 2016 at 3:02 AM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> hi ;
>  if possible could someone shed some light on this. I followed a
> post from the lastpickle which was very informative, but we had some
> concerns when it came to enabling SSL on a live production cluster.
>
>
> http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html
>
> 1 : We generally remove application traffic from a DC which has ongoing
> changes, just not to affect end customers if things go south during the
> update.
>
> 2 : So once DC-A has been restarted after enabling SSL, this would be
> missing writes during that period, as the DC-A would be shown as down by
> the other DC's. We will not be able to put back application traffic on DC-A
> until we run inter-dc repairs, which will happen only  when SSL has been
> enabled on all DC's.
>
> 3 : Repeating the procedure for every DC will lead to some missed writes
> across all DC's.
>
> 4 : We could do the rolling restart of a DC-A with application traffic on,
> but we are concerned if for any infrastructure related reason we have an
> issue, we will have to serve traffic from another DC-B, which might be
> missing on writes to the DC-A during that period.
>
> We have 4 DC's which 50 nodes each.
>
>
> thanks
> Sai
>
> -- Forwarded message --
> From: sai krishnam raju potturi 
> Date: Mon, Jul 18, 2016 at 11:06 AM
> Subject: Re : Recommended procedure for enabling SSL on a live production
> cluster
> To: user@cassandra.apache.org
>
>
> Hi;
>   We have a Cassandra cluster ( version 2.0.14 ) spanning across 4
> datacenters with 50 nodes each. We are planning to enable SSL between the
> datacenters. We are following the standard procedure for enabling SSL (
> http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html)
> . We were planning to enable SSL for each datacenter at a time.
>
> During the rolling restart, it's expected that the nodes in the
> datacenter that had the service restarted, will show as down by the nodes
> in other datacenters that have not restarted the service. This would lead
> to missed writes among various nodes during this procedure.
>
> What would be the recommended procedure for enabling SSL on a live
> production cluster without the chaos.
>
> thanks
> Sai
>
>


-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: OpsCenter sending alert emails or posting to a url never succeeded.

2016-07-20 Thread Ryan Springer
Yuan,

If you add a static key=value pair, you need to add it using the "fields="
section.  However, when the "fields=" section is used, you must list all of
the fields that you want to appear in the POST data.

Here is an example that sends message_type and the message itself:

fields=message_type=CRITICAL
   message={message}


Try placing this at the end of your posturl.conf where you currently only
have "message_type=CRITICAL"

Thank you,

Ryan Springer
Opscenter Provisioning Team

On Tue, Jul 19, 2016 at 6:56 PM, Yuan Fang  wrote:

> Anyone succeeded?
>
>
>
> Here is my setting for postUrl.
> ==
> ubuntu@ip-172-31-55-130:/etc/opscenter/event-plugins$ more posturl.conf
>
> [posturl]
> enabled=1
>
> # levels can be comma delimited list of any of the following:
> # DEBUG,INFO,WARN,ERROR,CRITICAL,ALERT
> # If left empty, will listen for all levels
> levels=
>
> # clusters is a comma delimited list of cluster names for which
> # this alert config will be eligible to run.
> # If left empty, this alert will will be called for events on all clusters
> clusters=
>
> # the URL to send a HTTP POST to
> url=https://alert.victorops.com/integrations/generic*
>
> # Set a username for basic HTTP authorization
> #username=foo
>
> # Set a password for basic HTTP authorization
> #password=bar
>
> # Set the type of posted data. Available options are 'json' or 'form'
> post_type=json
>
> # Fields specified here will override the default event data fields.
> #
> # They must be formatted as key-value pair, with key and value separated by
> # an equals (=). Each pair after the first must be on its own line,
> # indented beyond the first line
> #
> # You may use tokens found within the default event data for or in
> # values. For example, some available keys are:
> #   cluster, time, level_str, message, target_node, event_source, success,
> api_source_ip, user, source_node
> # Keys must be encapsulated in {brackets}.
> #
> #fields=textKey=value
> #mixedKey=cluster-{cluster}
> #event-msg={message}
> message_type=CRITICAL
>


Re: Are counters faster than CAS or vice versa?

2016-07-20 Thread Jonathan Haddad
Just to make sure I understand, you've got a queue where you can stand
missing processing the items in it?

On Wed, Jul 20, 2016 at 1:13 PM Kevin Burton  wrote:

> On Wed, Jul 20, 2016 at 11:53 AM, Jeff Jirsa 
> wrote:
>
>> Can you tolerate the value being “close, but not perfectly accurate”? If
>> not, don’t use a counter.
>>
>>
>>
>
> yeah.. agreed.. this is a problem which is something I was considering.  I
> guess it depends on whether they are 10x faster..
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>


Re: Are counters faster than CAS or vice versa?

2016-07-20 Thread Kevin Burton
On Wed, Jul 20, 2016 at 11:53 AM, Jeff Jirsa 
wrote:

> Can you tolerate the value being “close, but not perfectly accurate”? If
> not, don’t use a counter.
>
>
>

yeah.. agreed.. this is a problem which is something I was considering.  I
guess it depends on whether they are 10x faster..

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: Ring connection timeouts with 2.2.6

2016-07-20 Thread Juho Mäkinen
Just to pick this up: Did you see any system load spikes? I'm tracing a
problem on 2.2.7 where my cluster sees load spikes up to 20-30, when the
normal average load is around 3-4. So far I haven't found any good reason,
but I'm going to try otc_coalescing_strategy: disabled tomorrow.

 - Garo

On Fri, Jul 15, 2016 at 6:16 PM, Mike Heffner  wrote:

> Just to followup on this post with a couple of more data points:
>
> 1)
>
> We upgraded to 2.2.7 and did not see any change in behavior.
>
> 2)
>
> However, what *has* fixed this issue for us was disabling msg coalescing
> by setting:
>
> otc_coalescing_strategy: DISABLED
>
> We were using the default setting before (time horizon I believe).
>
> We see periodic timeouts on the ring (once every few hours), but they are
> brief and don't impact latency. With msg coalescing turned on we would see
> these timeouts persist consistently after an initial spike. My guess is
> that something in the coalescing logic is disturbed by the initial timeout
> spike which leads to dropping all / high-percentage of all subsequent
> traffic.
>
> We are planning to continue production use with msg coaleasing disabled
> for now and may run tests in our staging environments to identify where the
> coalescing is breaking this.
>
> Mike
>
> On Tue, Jul 5, 2016 at 12:14 PM, Mike Heffner  wrote:
>
>> Jeff,
>>
>> Thanks, yeah we updated to the 2.16.4 driver version from source. I don't
>> believe we've hit the bugs mentioned in earlier driver versions.
>>
>> Mike
>>
>> On Mon, Jul 4, 2016 at 11:16 PM, Jeff Jirsa 
>> wrote:
>>
>>> AWS ubuntu 14.04 AMI ships with buggy enhanced networking driver –
>>> depending on your instance types / hypervisor choice, you may want to
>>> ensure you’re not seeing that bug.
>>>
>>>
>>>
>>> *From: *Mike Heffner 
>>> *Reply-To: *"user@cassandra.apache.org" 
>>> *Date: *Friday, July 1, 2016 at 1:10 PM
>>> *To: *"user@cassandra.apache.org" 
>>> *Cc: *Peter Norton 
>>> *Subject: *Re: Ring connection timeouts with 2.2.6
>>>
>>>
>>>
>>> Jens,
>>>
>>>
>>>
>>> We haven't noticed any particular large GC operations or even
>>> persistently high GC times.
>>>
>>>
>>>
>>> Mike
>>>
>>>
>>>
>>> On Thu, Jun 30, 2016 at 3:20 AM, Jens Rantil 
>>> wrote:
>>>
>>> Hi,
>>>
>>> Could it be garbage collection occurring on nodes that are more heavily
>>> loaded?
>>>
>>> Cheers,
>>> Jens
>>>
>>>
>>>
>>> Den sön 26 juni 2016 05:22Mike Heffner  skrev:
>>>
>>> One thing to add, if we do a rolling restart of the ring the timeouts
>>> disappear entirely for several hours and performance returns to normal.
>>> It's as if something is leaking over time, but we haven't seen any
>>> noticeable change in heap.
>>>
>>>
>>>
>>> On Thu, Jun 23, 2016 at 10:38 AM, Mike Heffner  wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> We have a 12 node 2.2.6 ring running in AWS, single DC with RF=3, that
>>> is sitting at <25% CPU, doing mostly writes, and not showing any particular
>>> long GC times/pauses. By all observed metrics the ring is healthy and
>>> performing well.
>>>
>>>
>>>
>>> However, we are noticing a pretty consistent number of connection
>>> timeouts coming from the messaging service between various pairs of nodes
>>> in the ring. The "Connection.TotalTimeouts" meter metric show 100k's of
>>> timeouts per minute, usually between two pairs of nodes for several hours
>>> at a time. It seems to occur for several hours at a time, then may stop or
>>> move to other pairs of nodes in the ring. The metric
>>> "Connection.SmallMessageDroppedTasks." will also grow for one pair of
>>> the nodes in the TotalTimeouts metric.
>>>
>>>
>>>
>>> Looking at the debug log typically shows a large number of messages like
>>> the following on one of the nodes:
>>>
>>>
>>>
>>> StorageProxy.java:1033 - Skipped writing hint for /172.26.33.177
>>> 
>>> (ttl 0)
>>>
>>> We have cross node timeouts enabled, but ntp is running on all nodes and
>>> no node appears to have time drift.
>>>
>>>
>>>
>>> The network appears to be fine between nodes, with iperf tests showing
>>> that we have a lot of headroom.
>>>
>>>
>>>
>>> Any thoughts on what to look for? Can we increase thread count/pool
>>> sizes for the messaging service?
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Mike
>>>
>>>
>>>
>>> --
>>>
>>>
>>>   Mike Heffner 
>>>
>>>   Librato, Inc.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>>   Mike Heffner 
>>>
>>>   Librato, Inc.
>>>
>>>
>>>
>>> --
>>>
>>> Jens Rantil
>>> Backend Developer @ Tink
>>>
>>> Tink AB, Wallingatan 5, 111 60 Stockholm, 

Re: Are counters faster than CAS or vice versa?

2016-07-20 Thread Jeff Jirsa
Can you tolerate the value being “close, but not perfectly accurate”? If not, 
don’t use a counter.

 

 

From:  on behalf of Kevin Burton 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, July 20, 2016 at 11:48 AM
To: "user@cassandra.apache.org" 
Subject: Are counters faster than CAS or vice versa?

 

We ended up implementing a task/queue system which uses a global pointer. 

 

Basically the pointer just increments ... so we have thousands of tasks that 
just increment this one pointer.

 

The problem is that we're seeing contention on it and not being able to write 
this record properly.

 

We're just doing a CAS operation now to read the existing value, then increment 
it.

 

I think it might have been better to implement this as a counter.  Would that 
be inherently faster or would a CAS be about the same?

 

I can't really test it without deploying it so I figured I would just ask here 
first.

 

Kevin

 

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations 
Engineers!

 

Founder/CEO Spinn3r.com

Location: San Francisco, CA

blog: http://burtonator.wordpress.com

… or check out my Google+ profile

Error! Filename not specified.



smime.p7s
Description: S/MIME cryptographic signature


Are counters faster than CAS or vice versa?

2016-07-20 Thread Kevin Burton
We ended up implementing a task/queue system which uses a global pointer.

Basically the pointer just increments ... so we have thousands of tasks
that just increment this one pointer.

The problem is that we're seeing contention on it and not being able to
write this record properly.

We're just doing a CAS operation now to read the existing value, then
increment it.

I think it might have been better to implement this as a counter.  Would
that be inherently faster or would a CAS be about the same?

I can't really test it without deploying it so I figured I would just ask
here first.

Kevin

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



My cluster shows high system load without any apparent reason

2016-07-20 Thread Juho Mäkinen
I just recently upgraded our cluster to 2.2.7 and after turning the cluster
under production load the instances started to show high load (as shown by
uptime) without any apparent reason and I'm not quite sure what could be
causing it.

We are running on i2.4xlarge, so we have 16 cores, 120GB of ram, four 800GB
SSDs (set as lvm stripe into one big lvol). Running 3.13.0-87-generic on
HVM virtualisation. Cluster has 26 TiB of data stored in two tables.

Symptoms:
 - High load, sometimes up to 30 for a short duration of few minutes, then
the load drops back to the cluster average: 3-4
 - Instances might have one compaction running, but might not have any
compactions.
 - Each node is serving around 250-300 reads per second and around 200
writes per second.
 - Restarting node fixes the problem for around 18-24 hours.
 - No or very little IO-wait.
 - top shows that around 3-10 threads are running on high cpu, but that
alone should not cause a load of 20-30.
 - Doesn't seem to be GC load: A system starts to show symptoms so that it
has ran only one CMS sweep. Not like it would do constant stop-the-world
gc's.
 - top shows that the C* processes use 100G of RSS memory. I assume that
this is because cassandra opens all SSTables with mmap() so that they will
pop up in the RSS count because of this.

What I've done so far:
 - Rolling restart. Helped for about one day.
 - Tried doing manual GC to the cluster.
 - Increased heap from 8 GiB with CMS to 16 GiB with G1GC.
 - sjk-plus shows bunch of SharedPool workers. Not sure what to make of
this.
 - Browsed over
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html but
didn't find any apparent

I know that the general symptom of "system shows high load" is not very
good and informative, but I don't know how to better describe what's going
on. I appreciate all ideas what to try and how to debug this further.

 - Garo


Re: Questions about anti-entropy repair

2016-07-20 Thread daemeon reiydelle
I don't know if my perspective on this will assist, so YMMV:

Summary

   1. Nodetool repairs are required when a node has issues and can't get
   its (e.g. hinted handoff) resync done: culprit: usually network, sometimes
   container/vm, rarely disk.
   2. Scripts to do partition range are a pain to maintain, and you have to
   be CONSTANTLY checking for new keyspaces, parsing them, etc. Git hub
   project?
   3. Monitor/monitor/monitor: if you do a best practices job of actually
   monitoring the FULL stack, you only need to do repairs when the world goes
   south.
   4. Are you alerted when errors show up in the logs, network goes wacky,
   etc? No? then you have to CYA by doing hail mary passes with periodic
   nodetool repairs.
   5. Nodetool repair is a CYA for a cluster whose status is not well
   monitored.

Daemeon's thoughts:

Nodetool repair is not required for a cluster that is and "always has been"
in a known good state. Monitoring of the relevant logs/network/disk/etc. is
the only way that I know of to assure this state. Because (e.g. AWS, and
EVERY ONE OF my clients' infrastructures: screwed up networks) nodes can
disappear then the cluster *can* get overloaded (network traffic) causing
hinted handoffs to have all of the worst case corner cases you can never
hope to see.

So, if you have good monitoring in place to assure that there is known good
cluster behaviour (network, disk, etc.), repairs are not required until you
are alerted that a cluster health problem has occurred. Partition range
repair is a pain in various parts of the anatomy because one has to
CONSTANTLY be updating the scripts that generate the commands (I have not
seen a git hub project around this, would love to see responses that point
them out!).



*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Jul 20, 2016 at 4:33 AM, Alain RODRIGUEZ  wrote:

> Hi Satoshi,
>
>
>> Q1:
>> According to the DataStax document, it's recommended to run full repair
>> weekly or monthly. Is it needed even if repair with partitioner range
>> option ("nodetool repair -pr", in C* v2.2+) is set to run periodically for
>> every node in the cluster?
>>
>
> More accurately you need to run a repair for each node and each table
> within the gc_grace_seconds value defined at the table level to ensure no
> deleted data will return. Also running this on a regular basis ensure a
> constantly low entropy in your cluster, allowing better consistency (if not
> using a strong consistency like with CL.R = quorum).
>
> A full repair means every piece of data have been repaired. On a 3 node
> cluster with RF=3, running 'nodetool repair -pr' on the 3 nodes or
> 'nodetool repair' on one node are an equivalent "full repair". The best
> approach is often to run repair with '-pr' on all the nodes indeed. This is
> a full repair.
>
> Is it a good practice to repair a node without using non-repaired
>> snapshots when I want to restore a node because repair process is too slow?
>
>
> I am sorry, this is unclear to me. But from this "actually 1GB data is
> updated because the snapshot is already repaired" I understand you are
> using incremental repairs (or that you think that Cassandra repair uses it
> by default, which is not the case in your version).
> http://www.datastax.com/dev/blog/more-efficient-repairs
>
> Also, be aware that repair is a PITA for all the operators using
> Cassandra, that lead to many tries to improve things:
>
> Range repair: https://github.com/BrianGallew/cassandra_range_repair
> Reaper: https://github.com/spotify/cassandra-reaper
> Ticket to automatically schedule / handle repairs in Cassandra:
> https://issues.apache.org/jira/browse/CASSANDRA-10070
> Ticket to switch to Mutation Based Repairs (MBR):
> https://issues.apache.org/jira/browse/CASSANDRA-8911
>
> And probably many more... There is a lot to read and try, repair is an
> important yet non trivial topic for any Cassandra operator.
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
>
> 2016-07-14 9:41 GMT+02:00 Satoshi Hikida :
>
>> Hi,
>>
>> I have two questions about anti-entropy repair.
>>
>> Q1:
>> According to the DataStax document, it's recommended to run full repair
>> weekly or monthly. Is it needed even if repair with partitioner range
>> option ("nodetool repair -pr", in C* v2.2+) is set to run periodically for
>> every node in the cluster?
>>
>> References:
>> - DataStax, "When to run anti-entropy repair",
>> http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsRepairNodesWhen.html
>>
>>
>> Q2:
>> Is it a good practice to repair a node without using non-repaired
>> snapshots when I want to restore a node because repair process is too slow?
>>
>> I've done some simple verifications for anti-entropy repair and found out
>> that the repair process spends 

Re: Re : Recommended procedure for enabling SSL on a live production cluster

2016-07-20 Thread sai krishnam raju potturi
hi ;
 if possible could someone shed some light on this. I followed a
post from the lastpickle which was very informative, but we had some
concerns when it came to enabling SSL on a live production cluster.

http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html

1 : We generally remove application traffic from a DC which has ongoing
changes, just not to affect end customers if things go south during the
update.

2 : So once DC-A has been restarted after enabling SSL, this would be
missing writes during that period, as the DC-A would be shown as down by
the other DC's. We will not be able to put back application traffic on DC-A
until we run inter-dc repairs, which will happen only  when SSL has been
enabled on all DC's.

3 : Repeating the procedure for every DC will lead to some missed writes
across all DC's.

4 : We could do the rolling restart of a DC-A with application traffic on,
but we are concerned if for any infrastructure related reason we have an
issue, we will have to serve traffic from another DC-B, which might be
missing on writes to the DC-A during that period.

We have 4 DC's which 50 nodes each.


thanks
Sai

-- Forwarded message --
From: sai krishnam raju potturi 
Date: Mon, Jul 18, 2016 at 11:06 AM
Subject: Re : Recommended procedure for enabling SSL on a live production
cluster
To: user@cassandra.apache.org


Hi;
  We have a Cassandra cluster ( version 2.0.14 ) spanning across 4
datacenters with 50 nodes each. We are planning to enable SSL between the
datacenters. We are following the standard procedure for enabling SSL (
http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html)
. We were planning to enable SSL for each datacenter at a time.

During the rolling restart, it's expected that the nodes in the
datacenter that had the service restarted, will show as down by the nodes
in other datacenters that have not restarted the service. This would lead
to missed writes among various nodes during this procedure.

What would be the recommended procedure for enabling SSL on a live
production cluster without the chaos.

thanks
Sai


Re: Repair and LCS tables

2016-07-20 Thread Alain RODRIGUEZ
Hi Cyril,

What Cassandra version are you using?

There a lot of recent and ongoing work around LCS. Maybe one of these
tickets will be of interest to you:

https://issues.apache.org/jira/browse/CASSANDRA-10979
https://issues.apache.org/jira/browse/CASSANDRA-10862


C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-07-20 14:48 GMT+02:00 Cyril Scetbon :

> Hi,
>
> It seems that when LCS tables are used, sstables sent are not sent with
> the level information which generates a lot of compaction tasks in some
> cases. Don't you think that the level information should also be spent to
> avoid having desynchronized replicas to compact sstables sent ?
>
> Thanks
> --
> Cyril


Repair and LCS tables

2016-07-20 Thread Cyril Scetbon
Hi,

It seems that when LCS tables are used, sstables sent are not sent with the 
level information which generates a lot of compaction tasks in some cases. 
Don't you think that the level information should also be spent to avoid having 
desynchronized replicas to compact sstables sent ?  

Thanks
-- 
Cyril 

Re: Questions about anti-entropy repair

2016-07-20 Thread Alain RODRIGUEZ
Hi Satoshi,


> Q1:
> According to the DataStax document, it's recommended to run full repair
> weekly or monthly. Is it needed even if repair with partitioner range
> option ("nodetool repair -pr", in C* v2.2+) is set to run periodically for
> every node in the cluster?
>

More accurately you need to run a repair for each node and each table
within the gc_grace_seconds value defined at the table level to ensure no
deleted data will return. Also running this on a regular basis ensure a
constantly low entropy in your cluster, allowing better consistency (if not
using a strong consistency like with CL.R = quorum).

A full repair means every piece of data have been repaired. On a 3 node
cluster with RF=3, running 'nodetool repair -pr' on the 3 nodes or
'nodetool repair' on one node are an equivalent "full repair". The best
approach is often to run repair with '-pr' on all the nodes indeed. This is
a full repair.

Is it a good practice to repair a node without using non-repaired snapshots
> when I want to restore a node because repair process is too slow?


I am sorry, this is unclear to me. But from this "actually 1GB data is
updated because the snapshot is already repaired" I understand you are
using incremental repairs (or that you think that Cassandra repair uses it
by default, which is not the case in your version).
http://www.datastax.com/dev/blog/more-efficient-repairs

Also, be aware that repair is a PITA for all the operators using Cassandra,
that lead to many tries to improve things:

Range repair: https://github.com/BrianGallew/cassandra_range_repair
Reaper: https://github.com/spotify/cassandra-reaper
Ticket to automatically schedule / handle repairs in Cassandra:
https://issues.apache.org/jira/browse/CASSANDRA-10070
Ticket to switch to Mutation Based Repairs (MBR):
https://issues.apache.org/jira/browse/CASSANDRA-8911

And probably many more... There is a lot to read and try, repair is an
important yet non trivial topic for any Cassandra operator.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com




2016-07-14 9:41 GMT+02:00 Satoshi Hikida :

> Hi,
>
> I have two questions about anti-entropy repair.
>
> Q1:
> According to the DataStax document, it's recommended to run full repair
> weekly or monthly. Is it needed even if repair with partitioner range
> option ("nodetool repair -pr", in C* v2.2+) is set to run periodically for
> every node in the cluster?
>
> References:
> - DataStax, "When to run anti-entropy repair",
> http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsRepairNodesWhen.html
>
>
> Q2:
> Is it a good practice to repair a node without using non-repaired
> snapshots when I want to restore a node because repair process is too slow?
>
> I've done some simple verifications for anti-entropy repair and found out
> that the repair process spends too much time than simply transferring the
> replica data from existing nodes to restoring node.
>
> My verification settings are as following:
>
> - 3 node cluster (N1, N2, N3)
> - 2 CPUs, 8GB memory, 500GB HDD for each node
> - Replication Factor is 3
> - C* version is 2.2.6
> - CS is LCS
>
> And I prepared test data as following:
>
> - a snapshot (10GB, full repaired) for N1, N2, N3.
> - 1GB SSTables (by using incremental backup) for N1, N2, N3.
> - another 1GB SSTables for N1, N2
>
> I've measured repair time for two cases.
>
> - Case 1: repair N3 with the snapshot and 1GB SStables
> - Case 2: repair N3 with the snapshot only
>
> In case 1, N3 is needed to repair 12GB (actually 1GB data is updated
> because the snapshot is already repaired) and received 1GB data from N1 or
> N2. Whereas in case 2, N3 is needed to repair 12GB (actually just compare
> merkle tree for 10GB) and received 2GB data from N1 or N2.
>
> The result showed that case 2 was faster than case 1 (case 1: 6889sec,
> case 2: 4535sec). I guess the repair process is very slow and it would be
> better to repair a node without (non repaired) backed up (snapshot or
> incremental backup) files if the other replica nodes exists.
>
> So... I guess if I just have non-repaired backups, what's the point of
> using them? Looks like there's no merit... Am I missing something?
>
> Regards,
> Satoshi
>


RE: Exclude a host from the repair process

2016-07-20 Thread Amit Singh F
Hi Jean,

This option is available in C* version 2.1.x & above, where you can specify 
hosts in nodetool  repair command . For more detail please visit the link below 
:

https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html

Regards
Amit Singh

From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Sent: Wednesday, July 20, 2016 4:42 PM
To: user@cassandra.apache.org
Subject: Re: Exclude a host from the repair process

Hi Jean,

All the nodes are not necessary involved in a repair depending on vnodes being 
enabled or not, on your topology, on the racks you are using etc.

This being said, if a node was supposed to be part of a repair process, the 
repair of all the subranges including the down node will fail. That's what I 
have seen happening so far. @Stone Fang, not sure who is right on this (I might 
have missed some information about this topic), but there is a ticket about 
this topic: https://issues.apache.org/jira/browse/CASSANDRA-10446. You 
apparently can specify which nodes to repair, but a down node is not 
automatically ignored as far as I can tell.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-07-14 9:16 GMT+02:00 Stone Fang 
>:
dont think it is necessary to remove the down node.
the repair will continue comparing with other up node.ignore the down node.

On Wed, Jul 13, 2016 at 9:44 PM, Jean Carlo 
> wrote:
If a node is down in my cluster.

Is it possible to exclude him from the repair process in order to continue with 
the repair?
If not
Is the repair continue reparing the other replicas even if one is down?
Best regards

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay




Re: Exclude a host from the repair process

2016-07-20 Thread Alain RODRIGUEZ
Hi Jean,

All the nodes are not necessary involved in a repair depending on vnodes
being enabled or not, on your topology, on the racks you are using etc.

This being said, if a node was supposed to be part of a repair process, the
repair of all the subranges including the down node will fail. That's what
I have seen happening so far. @Stone Fang, not sure who is right on this (I
might have missed some information about this topic), but there is a ticket
about this topic: https://issues.apache.org/jira/browse/CASSANDRA-10446.
You apparently can specify which nodes to repair, but a down node is not
automatically ignored as far as I can tell.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-07-14 9:16 GMT+02:00 Stone Fang :

> dont think it is necessary to remove the down node.
> the repair will continue comparing with other up node.ignore the down node.
>
> On Wed, Jul 13, 2016 at 9:44 PM, Jean Carlo 
> wrote:
>
>> If a node is down in my cluster.
>>
>> Is it possible to exclude him from the repair process in order to
>> continue with the repair?
>> If not
>> Is the repair continue reparing the other replicas even if one is down?
>>
>> Best regards
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>
>


Re: CPU high load

2016-07-20 Thread Alain RODRIGUEZ
Hi Aoi,


> since few weeks
> ago, all of the cluster nodes are hitting avg. 15-20 cpu load.
> These nodes are running on VMs(VMware vSphere) that have 8vcpu
> (1core/socket)-16 vRAM.(JVM options : -Xms8G -Xmx8G -Xmn800M)


I take my chance, a few ideas / questions below:

   - What Cassandra version are you running?
   - How is your GC doing?
  - Run something like: grep "GC" /var/log/cassandra/system.log
  - If you have a lot of long CMS pauses you might not be keeping
  things in the new gen long enough: Xmn800M looks too small to me, it has
  been a default but I never saw a case where this setting worked
better than
  a higher value (let's say 2G), also tenuring threshold gives better
  results if set a bit higher than default (let's say 16). Those
options are
  in cassandra-env.sh.
   - Do you have other warnings or errors? Anything about tombstones or
   compacting wide rows incrementally?
   - What compaction strategy are you using
   - How many concurrent compactors do you use (if you have 8 cores, this
   value should probably be between 2 and 6, 4 is a good starting point)
   - If your compaction is not fast enough and disk are doing fine,
   consider increasing the compaction throughput from default 16 to 32 or 64
   Mbps to mitigate the impact of the point above.
   - Do you use compression ? What kind ?
   - Did the request count increased recently? Do you consider adding
   capacity or do you think you're hitting a new bug / issue that is worth it
   investigating / solving?
   - Are you using default configuration? What did you change?

No matter what you try, do it as much as possible on one canary node first,
and incrementally (one change at the time - using NEWHEAP = 2GB +
tenuringThreshold = 16 would be one change, it makes sense to move those 2
values together)


> I have enabled a auto repair service on opscenter and it's running behind


Also when did you do that, starting repairs? Repair is an expensive
operation, consuming a lot of resources that is often needed, but that is
hard to tune correctly. Are you sure you have enough CPU power to handle
the load + repairs?

Some other comments probably not directly related:


> I also realized that my cluster isn't well balanced


Well you cluster looks balanced to me 7 GB isn't that far from 11 GB. To
have a more accurate information, use 'nodetool status mykeyspace'. This
way ownership will be displayed, replacing (?) by ownership (xx %). Total
ownership = 300 % in your case (RF=3)


> I am running 6 nodes vnode cluster with DSE 4.8.1, and since few weeks
> ago, all of the cluster nodes are hitting avg. 15-20 cpu load.


By the way, from
https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/RNdse.html
:

"Warning: DataStax does not recommend 4.8.1 or 4.8.2 versions for
production, see warning. Use 4.8.3 instead.".

I am not sure what happened there but I would move to 4.8.3+ asap, datastax
people know their products and I don't like this kind of orange and bold
warnings :-).

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-07-14 4:36 GMT+02:00 Aoi Kadoya :

> Hi Romain,
>
> No, I don't think we upgraded cassandra version or changed any of
> those schema elements. After I realized this high load issue, I found
> that some of the tables have a shorter gc_grace_seconds(1day) than the
> rest and because it seemed causing constant compaction cycles, I have
> changed them to 10days. but again, that's after load hit this high
> number.
> some of nodes got eased a little bit after changing gc_grace_seconds
> values and repairing nodes, but since few days ago, all of nodes are
> constantly reporting load 15-20.
>
> Thank you for the suggestion about logging, let me try to change the
> log level to see what I can get from it.
>
> Thanks,
> Aoi
>
>
> 2016-07-13 13:28 GMT-07:00 Romain Hardouin :
> > Did you upgrade from a previous version? DId you make some schema changes
> > like compaction strategy, compression, bloom filter, etc.?
> > What about the R/W requests?
> > SharedPool Workers are... shared ;-) Put logs in debug to see some
> examples
> > of what services are using this pool (many actually).
> >
> > Best,
> >
> > Romain
> >
> >
> > Le Mercredi 13 juillet 2016 18h15, Patrick McFadin 
> a
> > écrit :
> >
> >
> > Might be more clear looking at nodetool tpstats
> >
> > From there you can see all the thread pools and if there are any blocks.
> > Could be something subtle like network.
> >
> > On Tue, Jul 12, 2016 at 3:23 PM, Aoi Kadoya 
> wrote:
> >
> > Hi,
> >
> > I am running 6 nodes vnode cluster with DSE 4.8.1, and since few weeks
> > ago, all of the cluster nodes are hitting avg. 15-20 cpu load.
> > These nodes are running on VMs(VMware vSphere) that have 8vcpu
> > (1core/socket)-16 vRAM.(JVM 

Re: cqlsh doesn't make response in LCS setting

2016-07-20 Thread Alain RODRIGUEZ
Hi Yuji Ito,

I don't know Cassandra 2.2 that much and I try to avoid using indexes, but
I imagine that what happened there is that creating the index took some
time, and all the newly created data went to L0 and compaction was
intensive as this node suddenly had a lot of pending compactions.

From
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/create_index_r.html:
"If data already exists for the column, Cassandra indexes the data during
the execution of this statement."

Why does cqlsh sometimes make no response in LCS setting for compaction?


So I would say it is basically still doing work. What happen if you run it
on a screen and come back later to see if this query completed
successfully? Again, I don't really know secondary indexes, I am just
guessing.

When this problem happened, Cassandra process used 100% CPU
> and debug.log was filled by "Choosing candidates for L0".
>

This would make sense if newly index created is using LCS internally (I am
not sure how much this is configurable, I imagine it is like a standard
table). As creating index start by handling existing columns, I imagine
this burst is expected. Yet there have been a lot of work done around LCS
to put data in the right level during some streaming operations, maybe are
you hitting an issue or at least something we could improve: a good
question would be, "is it normal / good for every sstable to go to L0 while
creating an index using LCS".

This problem hasn't occurred in STCS setting


Which problem, cqlsh hanging or tons of compactions + 100% CPU?

I imagine STCS would have used less resources to compact the newly created
index data, but the command would probably have hanged about the same way
than while using LCS as compaction is an asynchronous work and the indexes
for existing data need to be created anyway. Was the STCS test done against
a table holding as much data as the LCS one?

I am just guessing, I hope this makes sense.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-07-19 7:22 GMT+02:00 Yuji Ito :

> Hi all,
>
> I have a question.
> I use Cassandra 2.2.6.
>
> Why does cqlsh sometimes make no response in LCS setting for compaction?
>
> I requested as below:
>   cqlsh -e "create index new_index on keyspace.table (sub_column);"
>
> When this problem happened, Cassandra process used 100% CPU
> and debug.log was filled by "Choosing candidates for L0".
> This problem hasn't occurred in STCS setting.
>
> Thanks,
> Yuji Ito
>


Re: Cassandra thrift frameSize issue

2016-07-20 Thread Oleksandr Petrov
The problem you're seeing is because of the thrift max_message_length,
which is set to 16mb and is not configurable from outside / in the yaml
file.

If the JDBC wrapper supports paging, you might want to look into
configuring it.

On Tue, Jul 19, 2016 at 8:27 PM Saurabh Kumar  wrote:

> Hi ,
>
> I am trying run query over Cassandra cluster using JDBC connection using
> RJdbc library of R language.  I am
> getting org.apache.thrift.transport.TTransportException excetion as
> mentioned in atttached pic.
>
> I increased frame size into cassandra.yaml like :
>
> thrift_framed_transport_size_in_mb: 700
> thrift_max_message_length_in_mb: 730
>
>
> Still getting same error.
>
> Please help me out to resolve this issue.
>
>
>
> Thanks in advance.
>
>
> Regards,
> Saurabh
>
-- 
Alex Petrov