Re: repair in C* 3.11.2 and anticompactions

2018-05-23 Thread Lerh Chuan Low
Hey Jean,

I think it still does anticompaction by default regardless, it will not do
so only if you do subrange repair. TLP wrote a pretty good article on that:
http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html

On 24 May 2018 at 00:42, Jean Carlo  wrote:

> Hello
>
> I just want to understand why, if I run a repair non incremental like this
>
> nodetool -h 127.0.0.1 -p 7100 repair -full -pr keyspace1 standard1
>
> Cassandra does anticompaction as the logs show
>
> INFO  [CompactionExecutor:20] 2018-05-23 16:36:27,598
> CompactionManager.java:1545 - Anticompacting [BigTableReader(path='/home/
> jriveraura/.ccm/test/node1/data0/keyspace1/standard1-
> 36a6ec405e9411e8b1d1b38a73559799/mc-2-big-Data.db')]
>
> As far as I understood the anticompactions are used to make the repair
> incremantals possible, so I was expecting no having anticompactions making
> repairs with the options  -pr -full
>
> Anyone knows why does cassandra make those anticompactions ?
>
> Thanks
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>


Re: Cassandra downgrade version

2018-04-16 Thread Lerh Chuan Low
You should just be able to install 3.1.0 again if you need to as they are
in the 3.X line. To be really safe you can also take a snapshot and backup
your existing SSTables first..and always remember to test before upgrading
in Production :)

On 17 April 2018 at 07:48, Abdul Patel  wrote:

> Hi All,
>
> I am.planning to upgrade my cassandra cluster from 3.1.0 to 3.11.2 . Just
> in case if somethings goes back then do we have any rollback or downgrade
> option in cassandra  to older/ previous version?
>
> Thanks
>
>


Re: Is node restart required to update yaml changes in 2.1x

2018-03-12 Thread Lerh Chuan Low
To my knowledge for any version updates to cassandra.yaml will only be
applied after you restart the node..

On 13 March 2018 at 12:24, Kenneth Brotman 
wrote:

> Can you update changes to cassandra.yaml in version 2.1x without restating
> the node?
>
>
>
> Kenneth Brotman
>


Re: What snitch to use with AWS and Google

2018-03-12 Thread Lerh Chuan Low
I would just go with GossipingPropertyFileSnitch, it will work across both
data centers (I once had a test cluster with 1 DC in Azure, 1 DC in AWS and
1 DC in GCP using GPFS). Even if it's just solely AWS, I think GPFS is
superior because you can configure virtual racks if you ever need it while
EC2Snitch you are at the mercy of AWS. You have to put in a little extra to
configure rackdc.properties though, but I think it's worth it :)



On 13 March 2018 at 10:40, Madhu-Nosql  wrote:

> Kenneth,
>
> For AWS -EC2Snitch(if DC in Single Region)
> For Google- Better go with GossipingPropertyFileSnitch
>
> Thanks,
> Madhu
>
> On Mon, Mar 12, 2018 at 6:31 PM, Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
>
>> Quick question:  If you have one cluster made of nodes of a datacenter in
>> AWS and a datacenter in Google, what snitch do you use?
>>
>>
>>
>> Kenneth Brotman
>>
>
>


Re: TWCS enabling tombstone compaction

2018-03-12 Thread Lerh Chuan Low
Dear Lucas,

Those properties that result in the log message you are seeing are
properties common to all compaction strategies. See http://cassandra.apache.
org/doc/latest/operating/compaction.html#common-options. They are
*tombstone_compaction_interval
*and *tombstone_threshold*. If you didn't define them when you created your
table, then you will see the log message. I'm not fully certain what the
intent is, but in a TWCS setting you should only rely on TTLs and not run
checks for including SSTables with dropping tombstones (so it may save a
little bit of computation there). DTCS has the same property, which you can
find detail in this JIRA https://issues.apache.
org/jira/browse/CASSANDRA-9234.

You shouldn't be seeing it all the time though unless you are constantly
creating and dropping tables. Hope this helps in some way :)




On 10 March 2018 at 04:38, Lucas Benevides 
wrote:

> Dear community,
>
> I have been using TWCS in my lab, with TTL'd data.
> In the debug log there is always the sentence:
> "TimeWindowCompactionStrategy.java:65 Disabling tombstone compactions for
> TWCS". Indeed, the line is always repeated.
>
> What does it actually mean? If my data gets expired, the TWCS is already
> working and purging the SSTables that become expired. It surely sound
> strange to me to disable tombstone compaction.
>
> In the subcompaction subproperties there are only two subproperties,
> compaction_window_unit and compaction_window_size. Jeff already told us
> that the STCS properties also apply to TWCS, although it is not in the
> documentation.
>
> Thanks in advance,
> Lucas Benevides Dias
>


Re: What kind of Automation you have for Cassandra related operations on AWS ?

2018-02-08 Thread Lerh Chuan Low
Terraform, Packer and Ansible does pretty decently, you may have to do some
smarts around replacing nodes and attaching the right volumes to replaced
nodes. If you could get Kubernetes working with Cassandra (beyond the
readily available guides) then I think you'll be a total baller.

On 9 February 2018 at 15:45, daemeon reiydelle  wrote:

> Terraform plus ansible. Put ok but messy. 5-30,000 nodes and infra
>
>
> Daemeon (Dæmœn) Reiydelle
> USA 1.415.501.0198 <(415)%20501-0198>
>
> On Thu, Feb 8, 2018, 15:57 Ben Wood  wrote:
>
>> Shameless plug of our (DC/OS) Apache Cassandra service: https://docs.
>> mesosphere.com/services/cassandra/2.0.3-3.0.14.
>>
>> You must run DC/OS, but it will handle:
>> Restarts
>> Replacement of nodes
>> Modification of configuration
>> Backups and Restores (to S3)
>>
>> On Thu, Feb 8, 2018 at 3:46 PM, Krish Donald 
>> wrote:
>>
>>> Hi All,
>>>
>>> What kind of Automation you have for Cassandra related operations on AWS
>>> like restacking, restart of the cluster , changing cassandra.yaml
>>> parameters etc ?
>>>
>>> Thanks
>>>
>>>
>>
>>
>> --
>> Ben Wood
>> Software Engineer - Data Agility
>> Mesosphere
>>
>


Re: Any Cassandra Backup and Restore tool like Cassandra Reaper?

2017-12-14 Thread Lerh Chuan Low
Tablesnap assumes S3, and tableslurp can set up the stage for restoring by
downloading the relevant SSTables (but then it's up to the operator to
complete the restore from there). Restoring (especially point-in-time
restore) isn't easy to handle so there aren't a lot available out there.

There's also Netflix's Priam https://github.com/Netflix/Priam but I think
it's a little bit old and is meant to run alongside C* as an agent and be
the agent for repairs, monitoring, backups and restores, configuring
Cassandra YAMLs...

One other one I've heard of is
https://github.com/tbarbugli/cassandra_snapshotter but there's no restore
yet.

On 15 December 2017 at 07:52, Rutvij Bhatt  wrote:

> There is tablesnap/tablechop/tableslurp - https://github.com/
> JeremyGrosser/tablesnap.
>
>
> On Thu, Dec 14, 2017 at 3:49 PM Roger Brown  perfectsearchcorp.com> wrote:
>
>> I've found nothing affordable that works with vnodes. If you have money,
>> you could use DataStax OpsCenter or Datos.io Recoverx.
>>
>> I ended up creating a cron job to make snapshots along with
>> incremental_backups: true in the cassandra.yaml. And I'm thinking of
>> setting up a replication strategy so that one rack contains 1  replica of
>> each keyspace and then using r1soft to image each of those servers to tape
>> for offsite backup.
>>
>>
>> On Thu, Dec 14, 2017 at 1:30 PM Harika Vangapelli -T (hvangape - AKRAYA
>> INC at Cisco)  wrote:
>>
>>> Any Cassandra Backup and Restore tool like Cassandra Reaper for Repairs?
>>>
>>>
>>>
>>> [image:
>>> http://wwwin.cisco.com/c/dam/cec/organizations/gmcc/services-tools/signaturetool/images/logo/logo_gradient.png]
>>>
>>>
>>>
>>> *Harika Vangapelli*
>>>
>>> Engineer - IT
>>>
>>> hvang...@cisco.com
>>>
>>> Tel:
>>>
>>> *Cisco Systems, Inc.*
>>>
>>>
>>>
>>>
>>> United States
>>> cisco.com
>>>
>>>
>>>
>>> [image: http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif]Think
>>> before you print.
>>>
>>> This email may contain confidential and privileged material for the sole
>>> use of the intended recipient. Any review, use, distribution or disclosure
>>> by others is strictly prohibited. If you are not the intended recipient (or
>>> authorized to receive for the recipient), please contact the sender by
>>> reply email and delete all copies of this message.
>>>
>>> Please click here
>>> 
>>> for Company Registration Information.
>>>
>>>
>>>
>>


Re: Tablesnap with custom endpoint?

2017-12-14 Thread Lerh Chuan Low
Out of the box it assumes AWS S3 and is tailored to that using boto, so I
think you would have to checkout the repository and make changes to specify
a different endpoint (and submit a PR back to it if you feel it's useful :)
)

On 15 December 2017 at 08:03, Roger Brown  wrote:

> I wanted to use tablesnap for backups. Instead of s3.amazonaws.com, I
> wanted to use our own s3-compatible endpoint. I could never figure it out.
> Do you know how to override the S3 endpoint tablesnap uses?
>
> Roger
>
>


Re: Connection refused - 127.0.0.1-Gossip

2017-12-05 Thread Lerh Chuan Low
I think as Jeff mentioned it sounds like a configuration issue, are you
sure you are using the same configmap/however it's being passed in and just
throwing out ideas, maybe the pods are behind a http proxy and you may have
forgotten to pass in the env vars?

On 6 December 2017 at 08:45, Jeff Jirsa  wrote:

> I don't have any k8 clusters to test with, but do you know how your yaml
> translates to cassandra.yaml ? What are the listen/broadcast addresses
> being set?
>
>
> On Tue, Dec 5, 2017 at 6:09 AM, Marek Kadek -T (mkadek - CONSOL PARTNERS
> LTD at Cisco)  wrote:
>
>> We are experiencing following issues with Cassandra on our kubernetes
>> clusters:
>>
>> ```
>>
>> @ kubectl exec -it cassandra-cassandra-0 -- tail
>> /var/log/cassandra/debug.log
>>
>> DEBUG [MessagingService-Outgoing-localhost/127.0.0.1-Gossip] 2017-12-05
>> 09:02:06,560 OutboundTcpConnection.java:545 - Unable to connect to
>> localhost/127.0.0.1
>>
>> java.net.ConnectException: Connection refused
>>
>> at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_131]
>>
>> at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_131]
>>
>> at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_131]
>>
>> at sun.nio.ch.SocketChannelImpl.c
>> onnect(SocketChannelImpl.java:648) ~[na:1.8.0_131]
>>
>> at org.apache.cassandra.net.Outbo
>> undTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:146)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>> at org.apache.cassandra.net.Outbo
>> undTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:132)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>> at org.apache.cassandra.net.Outbo
>> undTcpConnection.connect(OutboundTcpConnection.java:433)
>> [apache-cassandra-3.11.0.jar:3.11.0]
>>
>> at org.apache.cassandra.net.Outbo
>> undTcpConnection.run(OutboundTcpConnection.java:262)
>> [apache-cassandra-3.11.0.jar:3.11.0]
>>
>> ```
>>
>>
>>
>> Basically, it’s tons and tons of the same message over and over (on all
>> clusters, all C* nodes). It tries roughly 4-5 times a second to open a tcp
>> connection to localhost (?) for gossiping.
>>
>>
>>
>> What we know:
>>
>> - does not happen on Cassandra 3.0.15, but happen on 3.11.1 (same
>> configuration).
>>
>> - does happen even on minikube-single-Cassandra “cluster”.
>>
>> - does not happen on docker-compose Cassandra cluster, only on kubernetes
>> one.
>>
>>
>>
>> Our configuration is pretty much this helm chart:
>> https://github.com/kubernetes/charts/blob/master/incubator/c
>> assandra/values.yaml
>>
>>
>>
>> Do you have any idea what it could be related to?
>>
>>
>>
>
>


Re: Bootstrapping node on Cassandra 3.7 causes cluster-wide performance issues

2017-09-11 Thread Lerh Chuan Low
Hi Paul,

Agh, I don't have any experience with sstableofflinerelevel. Maybe Kurt
does, sorry.

Also, if it wasn't obvious, to add back the node to the cluster once it is
done would be the 3 commands, with enable substituted for disable. It feels
like it will take some time to get through all the compactions, likely more
than the hinted handoff window, so do make sure you are querying Cassandra
with strong consistency after you rejoin the node. Good luck!

Lerh

On 12 September 2017 at 11:53, Aaron Wykoff 
wrote:

> Unsubscribe
>
> On Mon, Sep 11, 2017 at 4:48 PM, Paul Pollack 
> wrote:
>
>> Hi,
>>
>> We run 48 node cluster that stores counts in wide rows. Each node is
>> using roughly 1TB space on a 2TB EBS gp2 drive for data directory and
>> LeveledCompactionStrategy. We have been trying to bootstrap new nodes that
>> use a raid0 configuration over 2 1TB EBS drives to increase I/O throughput
>> cap from 160 MB/s to 250 MB/s (AWS limits). Every time a node finishes
>> streaming it is bombarded by a large number of compactions. We see CPU load
>> on the new node spike extremely high and CPU load on all the other nodes in
>> the cluster drop unreasonably low. Meanwhile our app's latency for writes
>> to this cluster average 10 seconds or greater. We've already tried
>> throttling compaction throughput to 1 mbps and we've always had
>> concurrent_compactors set to 2 but the disk is still saturated. In every
>> case we have had to shut down the Cassandra process on the new node to
>> resume acceptable operations.
>>
>> We're currently upgrading all of our clients to use the 3.11.0 version of
>> the DataStax Python driver, which will allow us to add our next newly
>> bootstrapped node to a blacklist, hoping that if it doesn't accept writes
>> the rest of the cluster can serve them adequately (as is the case whenever
>> we turn down the bootstrapping node), and allow it to finish its
>> compactions.
>>
>> We were also interested in hearing if anyone has had much luck using the
>> sstableofflinerelevel tool, and if this is a reasonable approach for our
>> issue.
>>
>> One of my colleagues found a post where a user had a similar issue and
>> found that bloom filters had an extremely high false positive ratio, and
>> although I didn't check that during any of these attempts to bootstrap it
>> seems to me like if we have that many compactions to do we're likely to
>> observe that same thing.
>>
>> Would appreciate any guidance anyone can offer.
>>
>> Thanks,
>> Paul
>>
>
>


Re: Bootstrapping node on Cassandra 3.7 causes cluster-wide performance issues

2017-09-11 Thread Lerh Chuan Low
Hi Paul,

The new node will certainly have a lot of compactions to deal with being
LCS. Have you tried performing the following on the new node once it has
joined?

*nodetool disablebinary && nodetool disablethrift && nodetooldisablegossip*

This will disconnect Cassandra from the cluster, but not stop Cassandra
itself. At this point you can unthrottle compactions and let it compact
away. When it is done compacting, you can re-add it to the cluster and run
a repair if it has been over 3 hours. I don't think adding a blacklist will
help much because as long the data you insert replicates to the node (which
is slow) it will slow down the whole cluster.

As long as you have that node in the cluster, it will slow down everything.

Hope this helps you in some way :)

On 12 September 2017 at 09:48, Paul Pollack 
wrote:

> Hi,
>
> We run 48 node cluster that stores counts in wide rows. Each node is using
> roughly 1TB space on a 2TB EBS gp2 drive for data directory and
> LeveledCompactionStrategy. We have been trying to bootstrap new nodes that
> use a raid0 configuration over 2 1TB EBS drives to increase I/O throughput
> cap from 160 MB/s to 250 MB/s (AWS limits). Every time a node finishes
> streaming it is bombarded by a large number of compactions. We see CPU load
> on the new node spike extremely high and CPU load on all the other nodes in
> the cluster drop unreasonably low. Meanwhile our app's latency for writes
> to this cluster average 10 seconds or greater. We've already tried
> throttling compaction throughput to 1 mbps and we've always had
> concurrent_compactors set to 2 but the disk is still saturated. In every
> case we have had to shut down the Cassandra process on the new node to
> resume acceptable operations.
>
> We're currently upgrading all of our clients to use the 3.11.0 version of
> the DataStax Python driver, which will allow us to add our next newly
> bootstrapped node to a blacklist, hoping that if it doesn't accept writes
> the rest of the cluster can serve them adequately (as is the case whenever
> we turn down the bootstrapping node), and allow it to finish its
> compactions.
>
> We were also interested in hearing if anyone has had much luck using the
> sstableofflinerelevel tool, and if this is a reasonable approach for our
> issue.
>
> One of my colleagues found a post where a user had a similar issue and
> found that bloom filters had an extremely high false positive ratio, and
> although I didn't check that during any of these attempts to bootstrap it
> seems to me like if we have that many compactions to do we're likely to
> observe that same thing.
>
> Would appreciate any guidance anyone can offer.
>
> Thanks,
> Paul
>


Re: read/write request counts and write size of each write

2017-07-26 Thread Lerh Chuan Low
Nitan,

I'm not really familiar with jmxterm but the error it's returning to you is
because that MBean doesn't exist. The full MBean matching your example is
org.apache.cassandra.metrics:type=ClientRequest,scope=CASWrite,name=MutationSizeHistogram.
You can then call the operation values on it to get the current snapshot of
metrics.

run -b
org.apache.cassandra.metrics:type=ClientRequest,scope=CASWrite,name=MutationSizeHistogram
values

Alternatively you can use 'get' to get an attribute instead of perform an
operation:

get -b
org.apache.cassandra.metrics:type=ClientRequest,scope=CASWrite,name=MutationSizeHistogram
Count

And instead of Count you can use 50thPercentile/75thPercentile etc as
mentioned in http://cassandra.apache.org/doc/latest/operating/metrics.html
(It's a Timer).

As Kurt mentioned, there are existing solutions to the JMX calls for you
(Graphana, Datadog). Hope that helps.



On 27 July 2017 at 07:07, Roger Warner  wrote:

> I think that is not the correct lib directory.You want it under
> $CASSANDRA_HOME/lib.   Ie wherever you deployed cassandra distro into / lib
>
>
>
> That directory should be loaded with *.jar files.   That is the directory
> you want.
>
>
>
> Roger
>
>
>
> *From: *Nitan Kainth 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Wednesday, July 26, 2017 at 1:42 PM
> *To: *"user@cassandra.apache.org" 
>
> *Subject: *Re: read/write request counts and write size of each write
>
>
>
> Hey Roger,
>
>
>
> I downloaded and saved the file in /var/lib/cassandra, but getting same
> error:
>
>
>
>  java -jar /tmp/jmxterm-1.0-alpha-4-uber.jar --url localhost:7199
>
> Welcome to JMX terminal. Type "help" for available commands.
>
> $>run -b org.apache.cassandra.metrics:type=ClientRequest scope=CASWrite
> name=MutationSizeHistogram
>
> #InstanceNotFoundException: org.apache.cassandra.metrics:
> type=ClientRequest
>
>
>
> I tried using mx4 jar file:
>
>
>
>  java -jar /var/lib/cassandra/mx4j-3.0.1.jar url localhost:7199
>
> no main manifest attribute, in /var/lib/cassandra/mx4j-3.0.1.jar
>
>
>
>
>
>
>
> On Jul 26, 2017, at 11:50 AM, Roger Warner  wrote:
>
>
>
> You need to also have the mx4j jar in your Cassandra lib directory.
> Double checking you did that – its not included with the distro.You
> have to download it.
>
>
>
> http://mx4j.sourceforge.net/
> 
>
>
>
> R
>
>
>
> *From: *Nitan Kainth 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Wednesday, July 26, 2017 at 8:22 AM
> *To: *"User cassandra.apache.org
> "
> 
> *Subject: *Re: read/write request counts and write size of each write
>
>
>
> Thank you very much Kurt.
>
>
>
> I am not a java guy, need one small help. I initiated JMX connection but I
> am getting some exception:
>
>
>
> java -jar ~/jmxterm-1.0-alpha-4-uber.jar --url localhost:7199
>
> Welcome to JMX terminal. Type "help" for available commands.
>
> $>run -b org.apache.cassandra.metrics:type=ClientRequest scope=CASWrite
> name=MutationSizeHistogram
>
> #InstanceNotFoundException: org.apache.cassandra.metrics:
> type=ClientRequest
>
>
>
>
> I verified, Cassandra is running on my machine.
>
>
>
>
>
>
> On Tue, Jul 25, 2017 at 9:36 PM, kurt greaves 
> wrote:
>
> Looks like you can collect MutationSizeHistogram for each write as well
> from the coordinator, in regards to write request size. See the Write
> request section underhttps://cassandra.apache.org/doc/latest/operating/
> metrics.html#client-request-metrics
> 
>
>
>
>
>
>
>
>
>