Re: backup data with Cass-operator and Medusa

2020-10-09 Thread John Sanda
Hi Manu I have started doing the preliminary work. I have a PR[1] for Medusa that needs to be reviewed/merged. I have done some work[2] for initial, low-level integration directly against statefulsets. I hope to have medusa integration land within the next couple releases of cass-operator. [1]

Re: Connect java application to Cassandra in Kubernetes

2020-08-06 Thread John Sanda
the client container when the liveness probe fails. And if you configure the driver to connect via the headless service, you will get the update endpoints. On Thu, Aug 6, 2020 at 11:00 PM John Sanda wrote: > Hi Pushpendra > > You should use the headless service, e.g., > > // Note

Re: Connect java application to Cassandra in Kubernetes

2020-08-06 Thread John Sanda
Hi Pushpendra You should use the headless service, e.g., // Note that this code snippet is using v3.x of the driver. // Assume the service is deployed in namespace dev and is // named cassandra-service. The FQDN of the service would then // be cassandra-service.dev.svc.cluster.local. If your

Re: Replacing a Cassandra node in K8S

2020-07-28 Thread John Sanda
The Cassandra pod will get scheduled to run on a different worker node, provided there is an available node that satisfies affinity rules, resource requirements, etc. And you are correct that the volume will get remounted. If however you are using a local or hostPath volume, then it will be lost

Re: What is the way to scale down Cassandra/Kubernetes cluster from 3 to 1 nodes using cass-operator

2020-07-07 Thread John Sanda
Cass Operator currently does not support scaling down. Thanks John On Thu, Jul 2, 2020 at 1:02 PM Manu Chadha wrote: > Hi > > > > I changed the file and applied it but the new configuration hasn’t got > applied. > > > > > > metadata: > > name: dc1 > > spec: > > clusterName: cluster1 > >

Re: Question on cass-operator

2020-07-06 Thread John Sanda
Hi Manu, The 2/2 indicates that there are two containers and each is in the ready state. As Vishal suggested, run kubectl describe pod to get more details. You also use kubectl get pod -o yaml. The former will include events in the output. You can run nodetool commands like this: $ kubectl -n

Re: Minimum System Requirements

2020-03-30 Thread John Sanda
I recently had to set up an integration testing environment that involves running multiple C* instances in containers using docker-compose. I am able to do so with a total memory for the container set at 512 MB and a 256 MB heap for C*. This is with C* 3.11.4. Going below 512 MB causes the

Re: slurm for cluster job scheduling and coordination

2020-03-10 Thread John Sanda
> > I've been working towards organizing an effort around using Kubernetes for > cluster management. There is a lot of work to do but this could be > something really important to tackle as a community if you(or anyone else) > are interested in getting involved. > This is a big area of interest

Re: Cassandra on Kubernetes

2019-10-30 Thread John Sanda
One of the problems I have experienced in the past has more to do with Java than Cassandra in particular, and that is the JVM ignoring cgroups. With Cassandra in particular I would often see memory usage go higher than what was desired. This would lead to pods getting oom killed. This was fixed in

Re: Rebuilding a node without clients hitting it

2019-08-05 Thread John Sanda
Assuming the rebuild is happening on a node in another DC, then there should not be an issue if you are using LOCAL_ONE. If the node is in the local DC (i.e., same DC as the client), I am inclined to think repair would be more appropriate than rebuild but I am not 100% certain. On Mon, Aug 5,

Re: Cassandra DataStax Java Driver in combination with Java EE / EJBs

2019-06-11 Thread John Sanda
Hi Ralph, A session is intended to be a long-lived, i.e., application-scoped object. You only need one session per cluster. I think what you are doing with the @Singleton is fine. In my opinion though, EJB really does not offer much value when working with Cassandra. I would be inclined to just

Re: CassKop : a Cassandra operator for Kubernetes developped by Orange

2019-05-24 Thread John Sanda
There is also https://github.com/sky-uk/cassandra-operator On Fri, May 24, 2019 at 2:34 PM Rahul Singh wrote: > Fantastic! Now there are three teams making k8s operators for C*: > Datastax, Instaclustr, and now Orange. > > rahul.xavier.si...@gmail.com > > http://cassandra.link > > I'm speaking

Re: Commit Log sync problems

2019-03-09 Thread John Sanda
Hi Meg, I believe that the average duration reported is the total amount of time that exceeded the interval divided by the number of syncs that exceeded the interval. Cassandra is not complaining because commit log syncs took 0.66 ms but rather on average 4 commit log syncs 10060 ms. Cheers,

multiple table directories for system_schema keyspace

2018-04-17 Thread John Sanda
On a couple different occasions I have run into this exception at start up: Exception (org.apache.cassandra.exceptions.InvalidRequestException) encountered during startup: Unknown type org.apache.cassandra.exceptions.InvalidRequestException: Unknown type at

Re: Understanding Blocked and All Time Blocked columns in tpstats

2018-03-23 Thread John Sanda
gt; > Chris > > > On Mar 23, 2018, at 11:42 AM, John Sanda <john.sa...@gmail.com> wrote: > > Thanks for the explanation. In the past when I have run into problems > related to CASSANDRA-11363, I have increased the queue size via the > cassandra.max_queued_native_trans

Re: Understanding Blocked and All Time Blocked columns in tpstats

2018-03-23 Thread John Sanda
ative transport pool > (sep pool) last I checked. Since 2.1 at least, before that there were a few > others. That changes version to version. For (basically) all other thread > pools the queue is limited by memory. > > Chris > > > On Mar 22, 2018, at 10:44 PM, John Sanda <j

Understanding Blocked and All Time Blocked columns in tpstats

2018-03-22 Thread John Sanda
I have been doing some work on a cluster that is impacted by https://issues.apache.org/jira/browse/CASSANDRA-11363. Reading through the ticket prompted me to take a closer look at org.apache.cassandra.concurrent.SEPExecutor. I am looking at the 3.0.14 code. I am a little confused about the Blocked

Not marking node down due to local pause

2017-10-19 Thread John Sanda
I have a small, two-node cluster running Cassandra 2.2.1. I am seeing a lot of these messages in both logs: WARN 07:23:16 Not marking nodes down due to local pause of 7219277694 > 50 I am fairly certain that they are not due to GC. I am not seeing a whole of GC being logged and nothing

Re: Cassandra - Nodes can't restart due to java.lang.OutOfMemoryError: Direct buffer memory

2017-08-31 Thread John Sanda
I am not sure which version of Netty is in 3.9, but maybe you are hitting https://issues.apache.org/jira/browse/CASSANDRA-13114. I hit this in Cassandra 3.0.9 which uses Netty 4.0.23. Here is the upstream netty ticket https://github.com/netty/netty/issues/3057. On Thu, Aug 31, 2017 at 10:15 AM,

Netty SSL memory leak

2017-05-30 Thread John Sanda
I have Cassandra 3.0.9 cluster that is hitting OutOfMemoryErrors with byte buffer allocation. The stack trace looks like: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:694) ~[na:1.8.0_131] at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)

Re: manual deletes with TWCS

2017-05-05 Thread John Sanda
dows fully expire. > > > On May 5, 2017, at 1:54 PM, John Sanda <john.sa...@gmail.com> wrote: > > How problematic is it to perform deletes when using TWCS? I am currently > using TWCS and have some new use cases for performing deletes. So far I > have avoided perfo

manual deletes with TWCS

2017-05-05 Thread John Sanda
How problematic is it to perform deletes when using TWCS? I am currently using TWCS and have some new use cases for performing deletes. So far I have avoided performing deletes, but I am wondering what issues I might run into. - John

partition sizes reported by nodetool tablehistograms

2017-02-24 Thread John Sanda
I am working on some issues involving really big partitions. I have been making extensive use of nodetool tablehistograms. What exactly is the partition size being reported? I have a table for which the max value reported is about 3.5 GB, but running du -h against the table data directory reports

Problems with large partitions and compaction

2017-02-14 Thread John Sanda
I have a table that uses LCS and has wound up with partitions upwards of 700 MB. I am seeing lots of the large partition warnings. Client requests are subsequently failing. The driver is not reporting timeout exception, just NoHostAvailableExceptions (in the logs I have reviewed so far). I know

compaction falling behind

2017-02-13 Thread John Sanda
What is a good way to determine whether or not compaction is falling behind? I read a couple things earlier that suggest nodetool compactionstats might not be the most reliable thing to use. - John

Re: Time series data model and tombstones

2017-02-08 Thread John Sanda
chunks of > xxx kilobytes worth of data (don't remember the exact value of xxx, maybe > 64k or far less) so you may end up reading tombstones. > > On Sun, Jan 29, 2017 at 9:24 PM, John Sanda <john.sa...@gmail.com> wrote: > >> Thanks for the clarification. Let's say I have a p

Questions about TWCS

2017-02-06 Thread John Sanda
In Jeff Jirsa C* 2016 summit presentation, TimeWindowCompactionStrategy for Time Series Workloads, there is a slide which talks about optimizations. It says to align partition keys to your TWCS windows. Is it generally the case that calendar/date based partitions would align nicely with TWCS

Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
d "chunk" of expired data in SSTABLE-12 may be > compacted together with a new chunk of SSTABLE-2 containing fresh data so > in the new resulting SSTable will contain tombstones AND fresh data inside > the same partition, but of course sorted by clustering column "time"

Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
equest for fresh data, Cassandra has > to scan over a lot tombstones to fetch the correct range of data thus your > issue > > On Sun, Jan 29, 2017 at 8:19 PM, John Sanda <john.sa...@gmail.com> wrote: > >> It was with STCS. It was on a 2.x version before TWCS was availab

Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
vior and tricky > configuration. > > On Sun, Jan 29, 2017 at 3:52 PM, John Sanda <john.sa...@gmail.com> wrote: > > Your partitioning key is text. If you have multiple entries per id you are > likely hitting older cells that have expired. Descending only affects how > the data is sto

Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
> > Your partitioning key is text. If you have multiple entries per id you are > likely hitting older cells that have expired. Descending only affects how > the data is stored on disk, if you have to read the whole partition to find > whichever time you are querying for you could potentially hit

Re: Time series data model and tombstones

2017-01-28 Thread John Sanda
gt; On Sat, Jan 28, 2017 at 8:30 AM John Sanda <john.sa...@gmail.com> wrote: > >> I have a time series data model that is basically: >> >> CREATE TABLE metrics ( >> id text, >> time timeuuid, >> value double, >> PRIMARY KEY (id, time) >

Time series data model and tombstones

2017-01-28 Thread John Sanda
I have a time series data model that is basically: CREATE TABLE metrics ( id text, time timeuuid, value double, PRIMARY KEY (id, time) ) WITH CLUSTERING ORDER BY (time DESC); I do append-only writes, no deletes, and use a TTL of seven days. Data points are written every seconds.

empty buckets with STCS

2016-12-01 Thread John Sanda
I have 2.2.1 Cassandra node that does not appear to be compacting SSTables. The table is currently configured with STCS. I turned on some debug logging and when the compaction checks run, they log: Compaction buckets are [] I have been going over SizeTieredCompactionStrategy.java and looking in

Re: commit log on NFS volume

2016-11-01 Thread John Sanda
n case of network problem writer tread >> can be blocked, also in case of failure loss of data can occur. >> >> Best regards, Vladimir Yudovin, >> >> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud >> CassandraLaunch your cluster in minutes.* >

commit log on NFS volume

2016-11-01 Thread John Sanda
I know that using NFS is discouraged, particularly for the commit log. Can anyone shed some light into what kinds of problems I might encounter aside from performance? The reason for my inquiry is because I have some deployments with Cassandra 2.2.1 that use NFS and are experiencing some problems

Re: Client-side timeouts after dropping table

2016-09-21 Thread John Sanda
there are multiple apps running, but it does happen fairly consistently almost immediately after the table is dropped. I don't see any indication of a server side timeout or any dropped mutations being reported in the log. On Tue, Sep 20, 2016 at 11:07 PM, John Sanda <john.sa...@gmail.com>

Re: Client-side timeouts after dropping table

2016-09-20 Thread John Sanda
l commit log size of like 32mb with 4mb > segments (or even lower depending on test data volume) so they basically > flush constantly and don't try to hold any tables open. Also lower > concurrent_writes substantially while you are at it to add some write > throttling. > > On Wed, Sep 21,

Re: Client-side timeouts after dropping table

2016-09-20 Thread John Sanda
e are you using? Outside of a handful of highly > experienced experts using EBS in very specific ways, it usually ends in > failure. > > On Tue, Sep 20, 2016 at 3:30 PM John Sanda <john.sa...@gmail.com> wrote: > >> I am deploying multiple Java web apps that connect to a Cassandra

Client-side timeouts after dropping table

2016-09-20 Thread John Sanda
I am deploying multiple Java web apps that connect to a Cassandra 3.7 instance. Each app creates its own schema at start up. One of the schema changes involves dropping a table. I am seeing frequent client-side timeouts reported by the DataStax driver after the DROP TABLE statement is executed. I

mutation checksum failure during commit log replay

2016-08-31 Thread John Sanda
What could cause an error like: ERROR 07:11:56 Exiting due to error while processing commit log during initialization. org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Mutation checksum failure at 818339 in CommitLog-5-1470234746867.log This is with Cassandra 2.2.4.

Re: Forming a cluster of embedded Cassandra instances

2016-02-14 Thread John Sanda
14, 2016 at 2:52 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > What motivated the use of an embedded instance for development - as > opposed to simply spawning a process for Cassandra? > > > > -- Jack Krupansky > > On Sun, Feb 14, 2016 at 2:05 PM, John Sand

Re: Forming a cluster of embedded Cassandra instances

2016-02-14 Thread John Sanda
The project I work on day to day uses an embedded instance of Cassandra, but it is intended for primarily for development. We embed Cassandra in a WildFly (i.e., JBoss) server. It is packaged and deployed as an EAR. I personally do not do this. I use and recommend ccm

Re: Schema Versioning

2016-02-12 Thread John Sanda
If you are interested in a solution that maintains scripts, there are at least a few projects available, https://github.com/comeara/pillar - Runs on the JVM and written in Scala. Scripts are CQL files. https://github.com/Contrast-Security-OSS/cassandra-migration - Runs on JVM and I believe a port

Consistent reads and first write wins

2015-07-07 Thread John Sanda
Suppose I have the following schema, CREATE TABLE foo ( id text, time timeuuid, prop1 text, PRIMARY KEY (id, time) ) WITHCLUSTERING ORDER BY (time ASC); And I have two clients who execute quorum writes, e.g., // client 1 INSERT INTO FOO (id, time, prop1) VALUES ('test',

Re: Example Data Modelling

2015-07-07 Thread John Sanda
25 MB seems very specific. Is there a reason why? On Tuesday, July 7, 2015, Peer, Oded oded.p...@rsa.com wrote: The data model suggested isn’t optimal for the “end of month” query you want to run since you are not querying by partition key. The query would look like “select EmpID, FN, LN,

Re: Migrate table data to another table

2015-06-30 Thread John Sanda
You might want to take a look at CQLSSTableWriter[1] in the Cassandra source tree. http://www.datastax.com/dev/blog/using-the-cassandra-bulk-loader-updated On Tue, Jun 30, 2015 at 1:18 PM, Umut Kocasaraç ukocasa...@gmail.com wrote: Hi, I want to change clustering order column of my table. As

Large SSTable not compacted with size tiered compaction

2014-07-07 Thread John Sanda
I have a write-heavy table that is using size tiered compaction. I am running C* 1.2.9. There is an SSTable that is not getting compacted. It is disproportionately larger than the other SSTables. The data file sizes are, 1.70 GB 0.18 GB 0.16 GB 0.05 GB 8.61 GB If I set the bucket_high compaction

Re: Performance problem with large wide row inserts using CQL

2014-02-19 Thread John Sanda
From a quick glance at your code, it looks like you are preparing your insert statement multiple times. You only need to prepare it once. I would expect to see some improvement with that change. On Wed, Feb 19, 2014 at 5:27 AM, Rüdiger Klaehn rkla...@gmail.com wrote: Hi all, I am evaluating

Re: user / password authentication advice

2013-12-12 Thread John Sanda
You could use CassandraAuthorizer and PaaswordAuthenticator which ships with Cassandra. See this article[1] for a good overview. [1] http://www.datastax.com/dev/blog/a-quick-tour-of-internal-authentication-and-authorization-security-in-datastax-enterprise-and-apache-cassandra On Thursday,

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread John Sanda
The session.execute blocks until the C* returns the response. Use the async version, but do so with caution. If you don't throttle the requests, you will start seeing timeouts on the client side pretty quickly. For throttling I've used a Semaphore, but I think Guava's RateLimiter is better suited.

Re: calculating sizes on disk

2013-12-07 Thread John Sanda
SSTable. On Fri, Dec 6, 2013 at 3:53 PM, John Sanda john.sa...@gmail.com wrote: I have done that, but it only gets me so far because the cluster and app that manages it is run by 3rd parties. Ideally, I would like to provide my end users with a formula or heuristic for establishing some sort

calculating sizes on disk

2013-12-06 Thread John Sanda
I am trying to do some disk capacity planning. I have been referring the datastax docs[1] and this older blog post[2]. I have a column family with the following, row key - 4 bytes column name - 8 bytes column value - 8 bytes max number of non-deleted columns per row - 20160 Is there an effective

Re: calculating sizes on disk

2013-12-06 Thread John Sanda
I should have also mentioned that I have tried using the calculations from the storage sizing post. My lack of success may be due to the post basing things off of Cassandra 0.8 as well as a lack of understanding in how to do some of the calculations. On Fri, Dec 6, 2013 at 3:08 PM, John Sanda

Re: calculating sizes on disk

2013-12-06 Thread John Sanda
to and measure the size on disk. __ Sent from iPhone On 7 Dec 2013, at 6:08 am, John Sanda john.sa...@gmail.com wrote: I am trying to do some disk capacity planning. I have been referring the datastax docs[1] and this older blog post[2]. I have a column family

reads and compression

2013-11-28 Thread John Sanda
This article[1] cites gains in read performance can be achieved when compression is enabled. The more I thought about it, even after reading the DataStax docs about reads[2], I realized I do not understand how compression improves read performance. Can someone provide some details on this? Is the

Re: Cassandra crashes

2013-09-09 Thread John Sanda
Check your file limits - http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=#cassandra/troubleshooting/trblshootInsufficientResources_r.html On Friday, September 6, 2013, Jan Algermissen wrote: On 06.09.2013, at 13:12, Alex Major

Re: system.peers and decommissioned nodes

2013-09-04 Thread John Sanda
and there have been a few fixes there, so and upgrade might not hurt). On Tue, Aug 27, 2013 at 5:53 PM, John Sanda john.sa...@gmail.com wrote: Forgot to mention before, the host_id column is null for one of the rows. Running nodetool removenode on the other one failed. StorageService threw

system.peers and decommissioned nodes

2013-08-27 Thread John Sanda
I had a 4 node cluster running C* 1.2.4. I am testing some client code for adding/removing nodes to/from the cluster. I decommissioned 3 nodes. I only have one node now; however, the system.peers table still has rows for two of the nodes that were decommissioned. nodetool status only reports the

Re: system.peers and decommissioned nodes

2013-08-27 Thread John Sanda
, 2013, John Sanda wrote: I had a 4 node cluster running C* 1.2.4. I am testing some client code for adding/removing nodes to/from the cluster. I decommissioned 3 nodes. I only have one node now; however, the system.peers table still has rows for two of the nodes that were decommissioned. nodetool

Re: insert performance (1.2.8)

2013-08-19 Thread John Sanda
I'd suggest using prepared statements that you initialize at application start up and switching to use Session.executeAsync coupled with Google Guava Futures API to get better throughput on the client side. On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman 8fo...@gmail.com wrote: Sure, I've

sstable_compression for system tables

2013-05-03 Thread John Sanda
Is there a way to change the sstable_compression for system tables? I am trying to deploy Cassandra 1.2.2 on a platform with IBM Java and 32 bit arch where the snappy-java native library fails to load. The error I get looks like, ERROR [SSTableBatchOpen:1] 2013-05-02 14:42:42,485

Re: sstable_compression for system tables

2013-05-03 Thread John Sanda
to read the SSTables on disk If IBM's JRE was used from the get go, there would have been no SSTable compression and hence no error. On Fri, May 3, 2013 at 5:28 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, May 3, 2013 at 11:07 AM, John Sanda john.sa...@gmail.com wrote: The machine where

question about internode_compression

2013-04-28 Thread John Sanda
When internode_compression is enabled, will the compression algorithm used be the same as whatever I am using for sstable_compression? - John

Re: DB Change management tools for Cassandra?

2013-04-25 Thread John Sanda
I had cobbled together a solution using Liquibase and the Cassandra JDBC driver. I started implemented it before the CQL driver was announced. The solution involved a patch and some Liquibase extensions which live at https://github.com/jsanda/cassandra-liquibase-ext. The patch will go into the 3.0

Re: Datatype Conversion in CQL-Client?

2012-11-19 Thread John Sanda
You might want to take look a org.apache.cassandra.transport.SimpleClient and org.apache.cassandra.transport.messages.ResultMessage. On Mon, Nov 19, 2012 at 9:48 AM, Timmy Turner timm.t...@gmail.com wrote: What I meant was the method that the Cassandra-jars give you when you include them in

Re: Datastax Java Driver

2012-11-19 Thread John Sanda
Fantastic! As for the object mapping API, has there been any discussion/consideration of http://www.hibernate.org/subprojects/ogm.html? On Mon, Nov 19, 2012 at 1:50 PM, Sylvain Lebresne sylv...@datastax.comwrote: Everyone, We've just open-sourced a new Java driver we have been working on

distribution of token ranges with virtual nodes

2012-10-31 Thread John Sanda
I am not entirely clear on what http://wiki.apache.org/cassandra/VirtualNodes/Balance#imbalance is saying with respect to random vs. manual token selection. Can/should i assume that i will get even range distribution or close to it with random token selection? For the sake of discussion, what is a

CQL load balancing

2012-10-15 Thread John Sanda
Hector provides load balancing so that requests can be distributed across cluster nodes based on a specified policy, like round robin. Is there anything similar planned for CQL? I see that there is an open issue ( http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/issues/detail?id=41) to

question about where clause of CQL update statement

2012-10-05 Thread John Sanda
I am using CQL 3 and trying to execute the following, UPDATE CHANGELOGLOCK SET LOCKED = 'true', LOCKEDBY = '10.11.8.242 (10.11.8.242)', LOCKGRANTED = '2012-10-05 16:58:01' WHERE ID = 1 AND LOCKED = 'false'; It gives me the error, Bad Request: PRIMARY KEY part locked found in SET part. The

schema change management tools

2012-10-04 Thread John Sanda
I have been looking to see if there are any schema change management tools for Cassandra. I have not come across any so far. I figured I would check to see if anyone can point me to something before I start trying to implement something on my own. I have used liquibase ( http://www.liquibase.org)

Re: schema change management tools

2012-10-04 Thread John Sanda
it. Even with MySQL I never bothered. Jon On Thu, Oct 4, 2012 at 6:27 PM, John Sanda john.sa...@gmail.com wrote: I have been looking to see if there are any schema change management tools for Cassandra. I have not come across any so far. I figured I would check to see if anyone can point