Re: trouble setting up initial cluster: Host ID collision between active endpoint
Hi Tim If you want to check out Cassandra on AWS you should also have a look www.instaclustr.com. We are still very much in Beta (so if you come across anything, please let us know), but if you have a few minutes and want to deploy a cluster in just a few clicks I highly recommend trying Instaclustr out. Cheers Ben Bromhead *Instaclustr* On Fri, Jan 25, 2013 at 12:35 AM, Tim Dunphy bluethu...@gmail.com wrote: Cool Thanks for the advice Aaron. I actually did get this working before I read your reply. The trick apparently for me was to use the IP for the first node in the seeds setting of each successive node. But I like the idea of using larges for an hour or so and terminating them for some basic experimentation. Also, thanks for pointing me to the Datastax AMIs I'll be sure to check them out. Tim On Thu, Jan 24, 2013 at 3:45 AM, aaron morton aa...@thelastpickle.comwrote: They both have 0 for their token, and this is stored in their System keyspace. Scrub them and start again. But I found that the tokens that were being generated would require way too much memory Token assignments have nothing to do with memory usage. m1.micro instances You are better off using your laptop than micro instances. For playing around try m1.large and terminate them when not in use. To make life easier use this to make the cluster for you http://www.datastax.com/docs/1.2/install/install_ami Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 24/01/2013, at 5:17 AM, Tim Dunphy bluethu...@gmail.com wrote: Hello list, I really do appreciate the advice I've gotten here as I start building familiarity with Cassandra. Aside from the single node instance I setup for a developer friend, I've just been playing with a single node in a VM on my laptop and playing around with the cassandra-cli and PHP. Well I've decided to setup my first cluster on my amazon ec2 account and I'm running into an issue getting the nodes to gossip. I've set the IP's of 'node01' and 'node02' ec2 instances in their respective listen_address, rpc_address and made sure that the 'cluster_name' on both was in agreement. I believe the problem may be in one of two places: either the seeds or the initial_token setting. For the seeds I have it setup as such. I put the IPs for both machines in the 'seeds' settings for each, thinking this would be how each node would discover each other: - seeds: 10.xxx.xxx.248,10.xxx.xxx.123 Initially I tried the tokengen script that I found in the documentation. But I found that the tokens that were being generated would require way too much memory for the m1.micro instances that I'm experimenting with on the Amazon free tier. And according to the docs in the config it is in some cases ok to leave that field blank. So that's what I did on both instances. Not sure how much/if this matters but I am using the setting - endpoint_snitch: Ec2Snitch Finally, when I start up the first node all goes well. But when I startup the second node I see this exception on both hosts: node1 INFO 11:02:32,231 Listening for thrift clients... INFO 11:02:59,262 Node /10.xxx.xxx.123 is now part of the cluster INFO 11:02:59,268 InetAddress /10.xxx.xxx.123 is now UP ERROR 11:02:59,270 Exception in thread Thread[GossipStage:1,5,main] java.lang.RuntimeException: Host ID collision between active endpoint /10..xxx.248 and /10.xxx.xxx.123 (id=54ce7ccd-1b1d-418e-9861-1c281c078b8f) at org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:227) at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1296) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1157) at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1895) at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:805) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:883) at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:43) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) And on node02 I see: INFO 11:02:58,817 Starting Messaging Service on port 7000 INFO 11:02:58,835 Using saved token [0] INFO 11:02:58,837 Enqueuing flush of Memtable-local@672636645(84/84 serialized/live bytes, 4 ops) INFO 11:02:58,838 Writing Memtable-local@672636645(84/84 serialized/live bytes, 4 ops) INFO 11:02:58,912 Completed flushing /var/lib/cassandra/data/system/local/system-local-ia-43-Data.db (120 bytes) for commitlog position ReplayPosition(segmentId
Re: no other nodes seen on priam cluster
Hi Marcelo A few questions: Have your added the priam java agent to cassandras JVM argurments (e.g. -javaagent:$CASS_HOME/lib/priam-cass-extensions-1.1.15.jar) and does the web container running priam have permissions to write to the cassandra config directory? Also what do the priam logs say? If you want to get up and running quickly with cassandra, AWS and priam quickly check out www.instaclustr.comhttp://www.instaclustr.com/?cid=cass-listyou. We deploy Cassandra under your AWS account and you have full root access to the nodes if you want to explore and play around + there is a free tier which is great for experimenting and trying Cassandra out. Cheers Ben On Wed, Feb 27, 2013 at 6:09 AM, Marcelo Elias Del Valle mvall...@gmail.com wrote: Hello, I am using cassandra 1.2.1 and I am trying to set up a Priam cluster on AWS with two nodes. However, I can't get both nodes up and running because of a weird error (at least to me). When I start both nodes, they are both able to connect to each other and do some communication. However, after some seconds, I just see Java.lang.RuntimeException: No other nodes seen! , so they disconnect and die. I tried to test all ports (7000, 9160 and 7199) between both nodes and there is no firewall. On the second node, before the above exception, I get a broken pipe, as shown bellow. Any hint? DEBUG 18:54:31,776 attempting to connect to /10.224.238.170 DEBUG 18:54:32,402 Reseting version for /10.224.238.170 DEBUG 18:54:32,778 Connection version 6 from /10.224.238.170 DEBUG 18:54:32,779 Upgrading incoming connection to be compressed DEBUG 18:54:32,779 Max version for /10.224.238.170 is 6 DEBUG 18:54:32,779 Setting version 6 for /10.224.238.170 DEBUG 18:54:32,780 set version for /10.224.238.170 to 6 DEBUG 18:54:33,455 Disseminating load info ... DEBUG 18:54:59,082 Reseting version for /10.224.238.170 DEBUG 18:55:00,405 error writing to /10.224.238.170 java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:72) at sun.nio.ch.IOUtil.write(IOUtil.java:43) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at java.nio.channels.Channels.writeFullyImpl(Channels.java:59) at java.nio.channels.Channels.writeFully(Channels.java:81) at java.nio.channels.Channels.access$000(Channels.java:47) at java.nio.channels.Channels$1.write(Channels.java:155) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.xerial.snappy.SnappyOutputStream.flush(SnappyOutputStream.java:272) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:189) at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:143) DEBUG 18:55:01,405 attempting to connect to /10.224.238.170 DEBUG 18:55:01,461 Started replayAllFailedBatches DEBUG 18:55:01,462 forceFlush requested but everything is clean in batchlog DEBUG 18:55:01,463 Finished replayAllFailedBatches INFO 18:55:01,472 JOINING: schema complete, ready to bootstrap DEBUG 18:55:01,473 ... got ring + schema info INFO 18:55:01,473 JOINING: getting bootstrap token ERROR 18:55:01,475 Exception encountered during startup java.lang.RuntimeException: No other nodes seen! Unable to bootstrap.If you intended to start a single-node cluster, you should make sure your broadcast_address (or listen_address) is listed as a seed. Otherwise, you need to determine why the seed being contacted has no knowledge of the rest of the cluster. Usually, this can be solved by giving all nodes the same seed list. and on the first node: DEBUG 18:54:30,833 Disseminating load info ... DEBUG 18:54:31,532 Connection version 6 from /10.242.139.159 DEBUG 18:54:31,533 Upgrading incoming connection to be compressed DEBUG 18:54:31,534 Max version for /10.242.139.159 is 6 DEBUG 18:54:31,534 Setting version 6 for /10.242.139.159 DEBUG 18:54:31,534 set version for /10.242.139.159 to 6 DEBUG 18:54:31,542 Reseting version for /10.242.139.159 DEBUG 18:54:31,791 Connection version 6 from /10.242.139.159 DEBUG 18:54:31,792 Upgrading incoming connection to be compressed DEBUG 18:54:31,792 Max version for /10.242.139.159 is 6 DEBUG 18:54:31,792 Setting version 6 for /10.242.139.159 DEBUG 18:54:31,793 set version for /10.242.139.159 to 6 INFO 18:54:32,414 Node /10.242.139.159 is now part of the cluster DEBUG 18:54:32,415 Resetting pool for /10.242.139.159 DEBUG 18:54:32,415 removing expire time for endpoint : /10.242.139.159 INFO 18:54:32,415 InetAddress /10.242.139.159 is now UP DEBUG 18:54:32,789 attempting to connect to ec2-75-101-233-115.compute-1.amazonaws.com/10.242.139.159 DEBUG 18:54:58,840 Started replayAllFailedBatches DEBUG
Re: no other nodes seen on priam cluster
Off the top of my head I would check to make sure the Autoscaling Group you created is restricted to a single Availability Zone, also Priam sets the number of EC2 instances it expects based on the maximum instance count you set on your scaling group (it did this last time i checked a few months ago, it's behaviour may have changed). So I would make your desired, min and max instances for your scaling group are all the same, make sure your ASG is restricted to a single availability zone (e.g. us-east-1b) and then (if you are able to and there is no data in your cluster) delete all the SimpleDB entries Priam has created and then also possibly clear out the cassandra data directory. Other than that I see you've raised it as an issue on the Priam project page , so see what they say ;) Cheers Ben On Thu, Feb 28, 2013 at 3:40 AM, Marcelo Elias Del Valle mvall...@gmail.com wrote: One additional important info, I checked here and the seeds seems really different on each node. The command echo `curl http://127.0.0.1:8080/Priam/REST/v1/cassconfig/get_seeds`http://127.0.0.1:8080/Priam/REST/v1/cassconfig/get_seeds returns ip2 on first node and ip1,ip1 on second node. Any idea why? It's probably what is causing cassandra to die, right? 2013/2/27 Marcelo Elias Del Valle mvall...@gmail.com Hello Ben, Thanks for the willingness to help, 2013/2/27 Ben Bromhead b...@instaclustr.com Have your added the priam java agent to cassandras JVM argurments (e.g. -javaagent:$CASS_HOME/lib/priam-cass-extensions-1.1.15.jar) and does the web container running priam have permissions to write to the cassandra config directory? Also what do the priam logs say? I put the priam log of the first node bellow. Yes, I have added priam-cass-extensions to java args and Priam IS actually writting to cassandra dir. If you want to get up and running quickly with cassandra, AWS and priam quickly check out www.instaclustr.comhttp://www.instaclustr.com/?cid=cass-listyou. We deploy Cassandra under your AWS account and you have full root access to the nodes if you want to explore and play around + there is a free tier which is great for experimenting and trying Cassandra out. That sounded really great. I am not sure if it would apply to our case (will consider it though), but some partners would have a great benefit from it, for sure! I will send your link to them. What priam says: 2013-02-27 14:14:58.0614 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Calling URL API: http://169.254.169.254/latest/meta-data/public-hostname returns: ec2-174-129-59-107.compute-1.amazon aws.com 2013-02-27 14:14:58.0615 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Calling URL API: http://169.254.169.254/latest/meta-data/public-ipv4 returns: 174.129.59.107 2013-02-27 14:14:58.0618 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Calling URL API: http://169.254.169.254/latest/meta-data/instance-id returns: i-88b32bfb 2013-02-27 14:14:58.0618 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Calling URL API: http://169.254.169.254/latest/meta-data/instance-type returns: c1.medium 2013-02-27 14:14:59.0614 INFO pool-2-thread-1 com.netflix.priam.defaultimpl.PriamConfiguration REGION set to us-east-1, ASG Name set to dmp_cluster-useast1b 2013-02-27 14:14:59.0746 INFO pool-2-thread-1 com.netflix.priam.defaultimpl.PriamConfiguration appid used to fetch properties is: dmp_cluster 2013-02-27 14:14:59.0843 INFO pool-2-thread-1 org.quartz.simpl.SimpleThreadPool Job execution threads will use class loader of thread: pool-2-thread-1 2013-02-27 14:14:59.0861 INFO pool-2-thread-1 org.quartz.core.SchedulerSignalerImpl Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl 2013-02-27 14:14:59.0862 INFO pool-2-thread-1 org.quartz.core.QuartzScheduler Quartz Scheduler v.1.7.3 created. 2013-02-27 14:14:59.0864 INFO pool-2-thread-1 org.quartz.simpl.RAMJobStore RAMJobStore initialized. 2013-02-27 14:14:59.0864 INFO pool-2-thread-1 org.quartz.impl.StdSchedulerFactory Quartz scheduler 'DefaultQuartzScheduler' initialized from default resource file in Quartz package: 'quartz.propertie s' 2013-02-27 14:14:59.0864 INFO pool-2-thread-1 org.quartz.impl.StdSchedulerFactory Quartz scheduler version: 1.7.3 2013-02-27 14:14:59.0864 INFO pool-2-thread-1 org.quartz.core.QuartzScheduler JobFactory set to: com.netflix.priam.scheduler.GuiceJobFactory@1b6a1c4 2013-02-27 14:15:00.0239 INFO pool-2-thread-1 com.netflix.priam.aws.AWSMembership Querying Amazon returned following instance in the ASG: us-east-1b -- i-8eb32bfd,i-88b32bfb 2013-02-27 14:15:01.0470 INFO Timer-0 org.quartz.utils.UpdateChecker New update(s) found: 1.8.5 [ http://www.terracotta.org/kit/reflector?kitID=defaultpageID=QuartzChangeLog ] 2013-02-27 14:15:10.0925 INFO pool-2-thread-1 com.netflix.priam.identity.InstanceIdentity Found dead instances: i-d49a0da7 2013-02-27 14:15:11.0397
Re: no other nodes seen on priam cluster
Glad you got it going! There is a REST call you can make to priam telling it to double the cluster size (/v1/cassconfig/double_ring), it will pre fill all SimpleDB entries for when the nodes come online, you then change the number of nodes on the autoscale group. Now that Priam supports C* 1.2 with Vnodes, increasing the cluster size in an ad-hoc manner might be just around the corner. Instacluster has some predefined cluster sizes (Free, Basic, Professional and Enterprise), these are loosely based on the estimated performance and storage capacity. You can also create a custom cluster where you define the number of nodes (minimum of 4) and the Instance type according to your requirements. For pricing on those check out https://www.instaclustr.com/pricing/per-instance, we base our pricing on estimated support and throughput requirements. Cheers Ben Instaclustr | www.instaclustr.com | @instaclustr On 02/03/2013, at 3:59 AM, Marcelo Elias Del Valle mvall...@gmail.com wrote: Thanks a lot Ben, actually I managed to make it work erasing the SimpleDB Priam uses to keeps instances... I would pulled the last commit from the repo, not sure if it helped or not. But you message made me curious about something... How do you do to add more Cassandra nodes on the fly? Just update the autoscale properties? I saw instaclustr.com changes the instance type as the number of nodes increase (not sure why the price also becomes higher per instance in this case), I am guessing priam use the data backed up to S3 to restore a node data in another instance, right? []s 2013/2/28 Ben Bromhead b...@relational.io Off the top of my head I would check to make sure the Autoscaling Group you created is restricted to a single Availability Zone, also Priam sets the number of EC2 instances it expects based on the maximum instance count you set on your scaling group (it did this last time i checked a few months ago, it's behaviour may have changed). So I would make your desired, min and max instances for your scaling group are all the same, make sure your ASG is restricted to a single availability zone (e.g. us-east-1b) and then (if you are able to and there is no data in your cluster) delete all the SimpleDB entries Priam has created and then also possibly clear out the cassandra data directory. Other than that I see you've raised it as an issue on the Priam project page , so see what they say ;) Cheers Ben On Thu, Feb 28, 2013 at 3:40 AM, Marcelo Elias Del Valle mvall...@gmail.com wrote: One additional important info, I checked here and the seeds seems really different on each node. The command echo `curl http://127.0.0.1:8080/Priam/REST/v1/cassconfig/get_seeds` returns ip2 on first node and ip1,ip1 on second node. Any idea why? It's probably what is causing cassandra to die, right? 2013/2/27 Marcelo Elias Del Valle mvall...@gmail.com Hello Ben, Thanks for the willingness to help, 2013/2/27 Ben Bromhead b...@instaclustr.com Have your added the priam java agent to cassandras JVM argurments (e.g. -javaagent:$CASS_HOME/lib/priam-cass-extensions-1.1.15.jar) and does the web container running priam have permissions to write to the cassandra config directory? Also what do the priam logs say? I put the priam log of the first node bellow. Yes, I have added priam-cass-extensions to java args and Priam IS actually writting to cassandra dir. If you want to get up and running quickly with cassandra, AWS and priam quickly check out www.instaclustr.com you. We deploy Cassandra under your AWS account and you have full root access to the nodes if you want to explore and play around + there is a free tier which is great for experimenting and trying Cassandra out. That sounded really great. I am not sure if it would apply to our case (will consider it though), but some partners would have a great benefit from it, for sure! I will send your link to them. What priam says: 2013-02-27 14:14:58.0614 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Calling URL API: http://169.254.169.254/latest/meta-data/public-hostname returns: ec2-174-129-59-107.compute-1.amazon aws.com 2013-02-27 14:14:58.0615 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Calling URL API: http://169.254.169.254/latest/meta-data/public-ipv4 returns: 174.129.59.107 2013-02-27 14:14:58.0618 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Calling URL API: http://169.254.169.254/latest/meta-data/instance-id returns: i-88b32bfb 2013-02-27 14:14:58.0618 INFO pool-2-thread-1 com.netflix.priam.utils.SystemUtils Calling URL API: http://169.254.169.254/latest/meta-data/instance-type returns: c1.medium 2013-02-27 14:14:59.0614 INFO pool-2-thread-1 com.netflix.priam.defaultimpl.PriamConfiguration REGION set to us-east-1, ASG Name set to dmp_cluster-useast1b 2013-02-27 14:14:59.0746 INFO pool-2-thread-1
Re: Cassandra instead of memcached
Check out http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html Netflix used Cassandra with SSDs and were able to drop their memcache layer. Mind you they were not using it purely as an in memory KV store. Ben Instaclustr | www.instaclustr.com | @instaclustr On 05/03/2013, at 4:33 PM, Drew Kutcharian d...@venarc.com wrote: Hi Guys, I'm thinking about using Cassandra as an in-memory key/value store instead of memcached for a new project (just to get rid of a dependency if possible). I was thinking about setting the replication factor to 1, enabling off-heap row-cache and setting gc_grace_period to zero for the CF that will be used for the key/value store. Has anyone tried this? Any comments? Thanks, Drew
Re: Using an EC2 cluster from the outside.
Depending on your client, disable automatic client discovery and just specify a list of all your nodes in your client configuration. For more details check out http://xzheng.net/blogs/problem-when-connecting-to-cassandra-with-ruby/ , obviously this deals specifically with a ruby client but it should be applicable to others. Cheers Ben Instaclustr | www.instaclustr.com | @instaclustr On 18/04/2013, at 5:43 AM, Robert Coli rc...@eventbrite.com wrote: On Wed, Apr 17, 2013 at 12:07 PM, maillis...@gmail.com wrote: I have a working 3 node cluster in a single ec2 region and I need to hit it from our datacenter. As you'd expect, the client gets the internal addresses of the nodes back. Someone on irc mentioned using the public IP for rpc and binding that address to the box. I see that mentioned in an old list mail but I don't get exactly how this is supposed to work. I could really use either a link to something with explicit directions or a detailed explanation. Should cassandra use the public IPs for everything -- listen, b'cast, and rpc? What should cassandra.yaml look like? Is the idea to use the public addresses for cassandra but route the requests between nodes over the lan using nat? Any help or suggestion is appreciated. Google EC2MultiRegionSnitch. =Rob
Re: Installing specific version
On ubuntu it is: apt-get install cassandra=1.2.4 So should be similar for debian Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 05/07/2013, at 10:59 PM, Kais Ahmed k...@neteck-fr.com wrote: Hi ben, You can get it from http://archive.apache.org/dist/cassandra/ 2013/7/5 Ben Gambley ben.gamb...@intoscience.com Hi all Can anyone point me in the right direction for installing a specific version from datastax repo, we need 1.2.4 to keep consistent with our qa environment. It's for a new prod cluster , on Debian 6. I thought it may be a value in /etc/apt/source.list ? The latest 1.2.6 does not appear compatible with our phpcassa thrift drivers. After many late nights my google ability seems to have evaporated! Cheers Ben
Re: Which of these VPS configurations would perform better for Cassandra ?
If you want to get a rough idea of how things will perform, fire up YCSB (https://github.com/brianfrankcooper/YCSB/wiki) and run the tests that closest match how you think your workload will be (run the test clients from a couple of beefy AWS spot-instances for less than a dollar). As you are a new startup without any existing load/traffic patterns, benchmarking will be your best bet. As a have a look at running Cassandra with SmartOS on Joyent. When you run SmartOS on Joyent virtualisation is done using solaris zones, an OS based virtualisation, which is at least a quadrillion times better than KVM, xen etc. Ok maybe not that much… but it is pretty cool and has the following benefits: - No hardware emulation. - Shared kernel with the host (you don't have to waste precious memory running a guest os). - ZFS :) Have a read of http://wiki.smartos.org/display/DOC/SmartOS+Virtualization for more info. There are some downsides as well: The version of Cassandra that comes with the SmartOS package management system is old and busted, so you will want to build from source. You will want to be technically confident in running on something a little outside the norm (SmartOS is based on Solaris). Just make sure you test and benchmark all your options, a few days of testing now will save you weeks of pain. Good luck! Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr On 05/08/2013, at 12:34 AM, David Schairer dschai...@humbaba.net wrote: Of course -- my point is simply that if you're looking for speed, SSD+KVM, especially in a shared tenant situation, is unlikely to perform the way you want to. If you're building a pure proof of concept that never stresses the system, it doesn't matter, but if you plan an MVP with any sort of scale, you'll want a plan to be on something more robust. I'll also say that it's really important (imho) to be doing even your dev in a config where you have consistency conditions like eventual production -- so make sure you're writing to both nodes and can have cases where eventual consistency delays kick in, or it'll come back to bite you later -- I've seen this force people to redesign their whole data model when they don't plan for it initially. As I said, I haven't tested DO. I've tested very similar configurations at other providers and they were all terrible under load -- and certainly took away most of the benefits of SSD once you stressed writes a bit. XEN+SSD, on modern kernels, should work better, but I didn't test it (linode doesn't offer this, though, and they've had lots of other challenges of late). --DRS On Aug 3, 2013, at 11:40 PM, Ertio Lew ertio...@gmail.com wrote: @David: Like all other start-ups, we too cannot start with all dedicated servers for Cassandra. So right now we have no better choice except for using a VPS :), but we can definitely choose one from amongst a suitable set of VPS configurations. As of now since we are starting out, could we initiate our cluster with 2 nodes(RF=2), (KVM, 2GB ram, 2 cores, 30GB SDD) . Right now we wont we having a very heavy load on Cassandra until a next few months till we grow our user base. So, this choice is mainly based on the pricing vs configuration as well as digital ocean's good reputation in the community. On Sun, Aug 4, 2013 at 12:53 AM, David Schairer dschai...@humbaba.net wrote: I've run several lab configurations on linodes; I wouldn't run cassandra on any shared virtual platform for large-scale production, just because your IO performance is going to be really hard to predict. Lots of people do, though -- depends on your cassandra loads and how consistent you need to have performance be, as well as how much of your working set will fit into memory. Remember that linode significantly oversells their CPU as well. The release version of KVM, at least as of a few months ago, still doesn't support TRIM on SSD; that, plus the fact that you don't know how others will use SSDs or if their file systems will keep the SSDs healthy, means that SSD performance on KVM is going to be highly unpredictable. I have not tested digitalocean, but I did test several other KVM+SSD shared-tenant hosting providers aggressively for cassandra a couple months ago; they all failed badly. Your mileage will vary considerably based on what you need out of cassandra, what your data patterns look like, and how you configure your system. That said, I would use xen before KVM for high-performance IO. I have not run Cassandra in any volume on Amazon -- lots of folks have, and may have recommendations (including SSD) there for where it falls on the price/performance curve. --DRS On Aug 3, 2013, at 11:33 AM, Ertio Lew ertio...@gmail.com wrote: I am building a cluster(initially starting with a 2-3 nodes cluster). I have came across two seemingly good options for hosting, Linode Digital Ocean
Re: Recommendation for hosting multi tenant clusters
http://www.mail-archive.com/user@cassandra.apache.org/msg11022.html sums it up pretty well. Optimised images and provisioned IOPS may help, but whatever way you spin it your reads and writes are still going out on the network somewhere. EBS is like a giant SAN which will drop out at any second, take almost everything in your region down with it whilst simultaneously opening up a gate to hell that lets all sorts of unimaginable horrors into the world. Ok maybe not that bad, but network issues between ebs and your instances is painful. Whereas network issues with a single AZ can be dealt with in the course of normal cluster operations. On a slight tangent, have a read of http://thelastpickle.com/2011/06/13/Down-For-Me/ which does an awesome job of explaining what will happen to your quorum reads and writes when a AWS AZ goes down (and you use ephemeral storage). Cheers Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 14/08/2013, at 10:42 AM, Jon Haddad j...@jonhaddad.com wrote: I strongly recommend against EBS, even with optimized ebs provisioned. The throughput you'll get from local drives is significantly better than what you'll get with EBS (even 4K iops provisioned) On Aug 13, 2013, at 2:10 PM, Rahul Gupta rgu...@dekaresearch.com wrote: I am working on requirement to host multi tenant Cassandra cluster (or set of clusters) on Amazon EC2 (AWS). With everything else sorted out, I have below question where I am looking for recommendations: Does Amazon’s recent support of EBS optimized images changes whole discussion around EBS vs. ephemeral drives and image size? · Option 1: reserved m1.xlarge (4x420GB drives) is $0.187/hr · Option 2: reserved m1.large EBS-optimized is $0.119/hr (~$50/month less than m1.xlarge, but $168/month for 4x420 standard EBS volumes): costs $120/month more, but additional recovery options Given Cassandra is designed to survive failures, combining replication factor 3 and backing-up to S3, I think should be enough for back up. Please advise. Thanks, Rahul Gupta DEKA Research Development 340 Commercial St Manchester, NH 03101 P: 603.666.3908 extn. 6504 | C: 603.718.9676 This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message. This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message. Thank you. Please consider the environment before printing this email.
Re: PropertiesFileSnitch
Look at GossipingPropertyFileSnitch (http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/architecture/architectureSnitchesAbout_c.html) and just use simple seed provider as described in the Datastax multi dc documentation. That way for each new node you just need to define its dc / rack and it will use gossip to discover this information about other nodes. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 10 Dec 2013, at 5:31 am, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hello everyone, I have a cassandra cluster running at amazon. I am trying to add a new datacenter for this cluster now, outside AWS. I know I could use multiregion, but I would like to be vendor free in terms of cloud. Reading the article http://www.datastax.com/docs/datastax_enterprise3.2/deploy/multi_dc_install, it seems I will need to start using PropertiesFileSnitch instead of Ec2Snitch to do what I want. So here it comes my question: If I set all the seeds on my property file, what will happen if I need to add mores machines and/or seeds to the cluster? Will I need to change the property files on all the nodes of my cluster, or just on the new node? Best regards, Marcelo Valle.
Re: in AWS is it worth trying to talk to a server in the same zone as your client?
0.01/G between zones irrespective of IP is correct. As for your original question, depending on the driver you are using you could write a custom co-ordinator node selection policy. For example if you are using the Datastax driver you would extend http://www.datastax.com/drivers/java/2.0/apidocs/com/datastax/driver/core/policies/LoadBalancingPolicy.html … and set the distance based on which zone the node is in. An alternate method would be to define the zones as data centres and then you could leverage existing DC aware policies (We've never tried this though). Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 13/02/2014, at 8:00 AM, Andrey Ilinykh ailin...@gmail.com wrote: I think you are mistaken. It is true for the same zone. between zones 0.01/G On Wed, Feb 12, 2014 at 12:17 PM, Russell Bradberry rbradbe...@gmail.com wrote: Not when using private IP addresses. That pricing ONLY applies if you are using the public interface or EIP/ENI. If you use the private IP addresses there is no cost associated. On February 12, 2014 at 3:13:58 PM, William Oberman (ober...@civicscience.com) wrote: Same region, cross zone transfer is $0.01 / GB (see http://aws.amazon.com/ec2/pricing/, Data Transfer section). On Wed, Feb 12, 2014 at 3:04 PM, Russell Bradberry rbradbe...@gmail.com wrote: Cross zone data transfer does not cost any extra money. LOCAL_QUORUM = QUORUM if all 6 servers are located in the same logical datacenter. Ensure your clients are connecting to either the local IP or the AWS hostname that is a CNAME to the local ip from within AWS. If you connect to the public IP you will get charged for outbound data transfer. On February 12, 2014 at 2:58:07 PM, Yogi Nerella (ynerella...@gmail.com) wrote: Also, may be you need to check the read consistency to local_quorum, otherwise the servers still try to read the data from all other data centers. I can understand the latency, but I cant understand how it would save money? The amount of data transferred from the AWS server to the client should be same no matter where the client is connected? On Wed, Feb 12, 2014 at 10:33 AM, Andrey Ilinykh ailin...@gmail.com wrote: yes, sure. Taking data from the same zone will reduce latency and save you some money. On Wed, Feb 12, 2014 at 10:13 AM, Brian Tarbox tar...@cabotresearch.com wrote: We're running a C* cluster with 6 servers spread across the four us-east1 zones. We also spread our clients (hundreds of them) across the four zones. Currently we give our clients a connection string listing all six servers and let C* do its thing. This is all working just fine...and we're paying a fair bit in AWS transfer costs. There is a suspicion that this transfer cost is driven by us passing data around between our C* servers and clients. Would there be any value to trying to get a client to talk to one of the C* servers in its own zone? I understand (at least partially!) about coordinator nodes and replication and know that no matter which server is the coordinator for an operation replication may cause bits to get transferred to/from servers in other zones. Having said that...is there a chance that trying to encourage a client to initially contact a server in its own zone would help? Thank you, Brian Tarbox
Re: Recommended OS
We are currently trialling SmartOS with Cassandra and have seen some pretty good results (and the mmap stuff appears to work). As Rob said, if this is production cluster, run with linux… there will be far less pain. If you are super keen on running on something different from linux in production (after all the warnings), run most of your cluster on linux, then run a single node or a separate DC with SmartOS, Solaris, BeOS, OS/2, Minix, Windows 3.1 or whatever it is that you choose and let us know how it all goes! Cheers Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 13/02/2014, at 6:32 AM, Jeffrey Kesselman jef...@gmail.com wrote: Its quite possible its well tricked out for Linux. My major issue with Linux has been that its TCP/IP stack is nowhere near as scalable as Solaris' for massive numbers of simultaneous connections. But thats probably less of an issue with a Cassandra node then it has been with the game servers I've built. On Wed, Feb 12, 2014 at 1:52 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Feb 12, 2014 at 8:55 AM, Jeffrey Kesselman jef...@gmail.com wrote: I haven't run Cassandra in production myself, but for other high load Java based servers I've had really good scaling success with OpenSolaris. In particular I've used Joyent's SmartOS which has the additional advantage of bursting to cover brief periods of exceptional load. There are a significant number of Linux only optimizations in Cassandra. Very few people operate production clusters on anything but Linux. The most obvious optimization that comes to mind is the use of direct i/o to avoid blowing out the page cache under various circumstances. My approach towards running Cassandra on anything but Linux would be to try to directly compare performance to the same hardware running Linux. =Rob -- It's always darkest just before you are eaten by a grue.
Re: Load balancing issue with virtual nodes
Some imbalance is expected and considered normal: See http://wiki.apache.org/cassandra/VirtualNodes/Balance As well as https://issues.apache.org/jira/browse/CASSANDRA-7032 Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 29 Apr 2014, at 7:30 am, DuyHai Doan doanduy...@gmail.com wrote: Hello all Some update about the issue. After wiping completely all sstable/commitlog/saved_caches folder and restart the cluster from scratch, we still experience weird figures. After the restart, nodetool status does not show an exact balance of 50% of data for each node : Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN host1 48.57 KB 256 51.6% d00de0d1-836f-4658-af64-3a12c00f47d6 rack1 UN host2 48.57 KB 256 48.4% e9d2505b-7ba7-414c-8b17-af3bbe79ed9c rack1 As you can see, the % is very close to 50% but not exactly 50% What can explain that ? Can it be network connection issue during token initial shuffle phase ? P.S: both host1 and host2 are supposed to have exactly the same hardware Regards Duy Hai DOAN On Thu, Apr 24, 2014 at 11:20 PM, Batranut Bogdan batra...@yahoo.com wrote: I don't know about hector but the datastax java driver needs just one ip from the cluster and it will discover the rest of the nodes. Then by default it will do a round robin when sending requests. So if Hector does the same the patterb will againg appear. Did you look at the size of the dirs? That documentation is for C* 0.8. It's old. But depending on your boxes you might reach CPU bottleneck. Might want to google for write path in cassandra.. According to that, there is not much to do when writes come in... On Friday, April 25, 2014 12:00 AM, DuyHai Doan doanduy...@gmail.com wrote: I did some experiments. Let's say we have node1 and node2 First, I configured Hector with node1 node2 as hosts and I saw that only node1 has high CPU load To eliminate the client connection issue, I re-test with only node2 provided as host for Hector. Same pattern. CPU load is above 50% on node1 and below 10% on node2. It means that node2 is playing as coordinator and forward many write/read request to node1 Why did I look at CPU load and not iostat al ? Because I have a very intensive write work load with read-only-once pattern. I've read here (http://www.datastax.com/docs/0.8/cluster_architecture/cluster_planning) that heavy write in C* is more CPU bound but maybe the info may be outdated and no longer true Regards Duy Hai DOAN On Thu, Apr 24, 2014 at 10:00 PM, Michael Shuler mich...@pbandjelly.org wrote: On 04/24/2014 10:29 AM, DuyHai Doan wrote: Client used = Hector 1.1-4 Default Load Balancing connection policy Both nodes addresses are provided to Hector so according to its connection policy, the client should switch alternatively between both nodes OK, so is only one connection being established to one node for one bulk write operation? Or are multiple connections being made to both nodes and writes performed on both? -- Michael
Re: Connect Cassandra rings in datacenter and ec2
You will need to have the nodes running on AWS in a VPC. You can then configure a VPN to work with your VPC, see http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html. Also as you will have multiple VPN connections (from your private DC and the other AWS region) AWS CloudHub will be the way to go http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPN_CloudHub.html. Additionally to access your Cassandra instances from your other VPCs you can use VPC peering (within the same region). See http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-peering.html Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 30 Apr 2014, at 11:38 am, Chris Lohfink clohf...@blackbirdit.com wrote: Cassandra will require a different address per node though or at least 1 unique internal for same DC and 1 unique external for other DCs. You could look into http://aws.amazon.com/vpc/ or some other vpn solution. --- Chris Lohfink On Apr 29, 2014, at 6:56 PM, Trung Tran tr...@brightcloud.com wrote: Hi, We're planning to deploy 3 cassandra rings, one in our datacenter (with more node/power) and two others in EC2. We don't have enough public IP to assign for each individual node in our data center, so i wonder how could we connect the cluster together? Have any one tried this before, and if this is a good way to deploy cassandra? Thanks, Trung.
Re: Can Cassandra client programs use hostnames instead of IPs?
You can set listen_address in cassandra.yaml to a hostname (http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html). Cassandra will use the IP address returned by a DNS query for that hostname. On AWS you don't have to assign an elastic IP, all instances will come with a public IP that lasts its lifetime (if you use ec2-classic or your VPC is set up to assign them). Note that whatever hostname you set in a nodes listen_address, it will need to return the private IP as AWS instances only have network access via there private address. Traffic to a instances public IP is NATed and forwarded to the private address. So you may as well just use the nodes IP address. If you run hadoop on instances in the same AWS region it will be able to access your Cassandra cluster via private IP. If you run hadoop externally just use the public IPs. If you run in a VPC without public addressing and want to connect from external hosts you will want to look at a VPN (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html). Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 13/05/2014, at 4:31 AM, Huiliang Zhang zhl...@gmail.com wrote: Hi, Cassandra returns ips of the nodes in the cassandra cluster for further communication between hadoop program and the casandra cluster. Is there a way to configure the cassandra cluster to return hostnames instead of ips? My cassandra cluster is on AWS and has no elastic ips which can be accessed outside AWS. Thanks, Huiliang
Re: Storing log structured data in Cassandra without compactions for performance boost.
If you make the timestamp the partition key you won't be able to do range queries (unless you use an ordered partitioner). Assuming you are logging from multiple devices you will want your partition key to be the device id the date, your clustering key to be the timestamp (timeuuid are good to prevent collisions) and then log message, levels etc as the other columns. Then you can also create a new table for every week (or day/month depending on how much granularity you want) and just write to the current weeks table. This step allows you to delete old data without Cassandra using tombstones (you just drop the table for the week of logs you want to delete). For a much clearer explantation see http://www.slideshare.net/patrickmcfadin/cassandra-20-and-timeseries (the last few slides). As for compaction, I would leave it enabled as having lots of stables hanging around can make range queries slower (the query has more files to visit). See http://stackoverflow.com/questions/8917882/cassandra-sstables-and-compaction (a little old but still relevant). Compaction also fixes up things like merging row fragments (when you write new columns to the same row). Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 07/05/2014, at 10:55 AM, Kevin Burton bur...@spinn3r.com wrote: I'm looking at storing log data in Cassandra… Every record is a unique timestamp for the key, and then the log line for the value. I think it would be best to just disable compactions. - there will never be any deletes. - all the data will be accessed in time range (probably partitioned randomly) and sequentially. So every time a memtable flushes, we will just keep that SSTable forever. Compacting the data is kind of redundant in this situation. I was thinking the best strategy is to use setcompactionthreshold and set the value VERY high to compactions are never triggered. Also, It would be IDEAL to be able to tell cassandra to just drop a full SSTable so that I can truncate older data without having to do a major compaction and without having to mark everything with a tombstone. Is this possible? -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Ec2 Network I/O
Also once you've got your phi_convict_threshold sorted, if you see these again check: http://status.aws.amazon.com/ AWS does occasionally have the odd increased latency issue / outage. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 19/05/2014, at 1:15 PM, Nate McCall n...@thelastpickle.com wrote: It's a good idea to increase phi_convict_threshold to at least 12 on EC2. Using placement groups and single-tenant systems will certainly help. Another optimization would be dedicating an Enhanced Network Interface (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html) specifically for gossip traffic. On Mon, May 19, 2014 at 1:36 PM, Phil Burress philburress...@gmail.com wrote: Has anyone experienced network i/o issues with ec2? We are seeing a lot of these in our logs: HintedHandOffManager.java (line 477) Timed out replaying hints to /10.0.x.xxx; aborting (15 delivered) and these... Cannot handshake version with /10.0.x.xxx and these... java.io.IOException: Cannot proceed on repair because a neighbor (/10.0.x.xxx) is dead: session failed Occurs on all of our nodes. Even though in all cases, the host that is being reported as down or unavailable is up and readily 'pingable'. We are using shared tenancy on all our nodes (instance type m1.xlarge) with cassandra 2.0.7. Any suggestions on how to debug these errors? Is there a recommendation to move to Placement Groups for Cassandra? Thanks! Phil -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: autoscaling cassandra cluster
The mechanics for it are simple compared to figuring out when to scale, especially when you want to be scaling before peak load on your cluster (adding and removing nodes puts additional load on your cluster). We are currently building our own in-house solution for this for our customers. If you want to have a go at it yourself, this is a good starting point: http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html http://techblog.netflix.com/2013/12/scryer-netflixs-predictive-auto-scaling.html Most of this is fairly specific to Netflix, but an interesting read nonetheless. Datastax OpsCenter also provides capacity planning and forecasting and can provide an easy set of metrics you can make your scaling decisions on. http://www.datastax.com/what-we-offer/products-services/datastax-opscenter Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 21/05/2014, at 7:51 AM, James Horey j...@opencore.io wrote: If you're interested and/or need some Cassandra docker images let me know I'll shoot you a link. James Sent from my iPhone On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote: That sounds interesting. I was thinking of using coreos with docker containers for the business logic, frontend and Cassandra. I'll also have a look at cassandra-mesos Thanks Jabbar Azam On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote: I agree with Prem, but recently a guy send this promising project called Mesos in this list. https://github.com/mesosphere/cassandra-mesos One of its goals is to make scaling easier. I don’t have any personal opinion yet but maybe you could give it a try. Regards, Panagiotis On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote: Hello Prem, I'm trying to find out whether people are autoscaling up and down automatically, not manually. I'm also interested in whether they are using a cloud based solution and creating and destroying instances. I've found the following regarding GCE https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platform and how instances can be created and destroyed. I Thanks Jabbar Azam On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote: Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: Multi-DC Environment Question
Short answer: If time elapsed max_hint_window_in_ms then hints will stop being created. You will need to rely on your read consistency level, read repair and anti-entropy repair operations to restore consistency. Long answer: http://www.slideshare.net/jasedbrown/understanding-antientropy-in-cassandra Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 30 May 2014, at 8:40 am, Tupshin Harper tups...@tupshin.com wrote: When one node or DC is down, coordinator nodes being written through will notice this fact and store hints (hinted handoff is the mechanism), and those hints are used to send the data that was not able to be replicated initially. http://www.datastax.com/dev/blog/modern-hinted-handoff -Tupshin On May 29, 2014 6:22 PM, Vasileios Vlachos vasileiosvlac...@gmail.com wrote: Hello All, We have plans to add a second DC to our live Cassandra environment. Currently RF=3 and we read and write at QUORUM. After adding DC2 we are going to be reading and writing at LOCAL_QUORUM. If my understanding is correct, when a client sends a write request, if the consistency level is satisfied on DC1 (that is RF/2+1), success is returned to the client and DC2 will eventually get the data as well. The assumption behind this is that the the client always connects to DC1 for reads and writes and given that there is a site-to-site VPN between DC1 and DC2. Therefore, DC1 will almost always return success before DC2 (actually I don't know if it is possible for DC2 to be more up-to-date than DC1 with this setup...). Now imagine DC1 looses connectivity and the client fails over to DC2. Everything should work fine after that, with the only difference that DC2 will be now handling the requests directly from the client. After some time, say after max_hint_window_in_ms, DC1 comes back up. My question is how do I bring DC1 up to speed with DC2 which is now more up-to-date? Will that require a nodetool repair on DC1 nodes? Also, what is the answer when the outage is max_hint_window_in_ms instead? Thanks in advance! Vasilis -- Kind Regards, Vasileios Vlachos
Re: Managing truststores with inter-node encryption
Java ssl sockets need to be able to build a chain of trust. So having either a nodes public cert or the root cert in the truststore works (as you found out). To get cassandra to use cypher suites 128 bit you will need to install the JCE unlimited strength jurisdiction policy files. You will know if you aren't using it because there will be a bunch of warnings quickly filling up your logs. Note that javas ssl implementation does not check certificate revocation lists by default, though as you are not using inter node for authentication and identification its no big deal. Ben On 31/05/2014 1:04 AM, Jeremy Jongsma jer...@barchart.com wrote: It appears that only adding the CA certificate to the truststore is sufficient for this. On Thu, May 22, 2014 at 10:05 AM, Jeremy Jongsma jer...@barchart.com wrote: The docs say that each node needs every other node's certificate in its local truststore: http://www.datastax.com/documentation/cassandra/1.2/cassandra/security/secureSSLCertificates_t.html This seems like a bit of a headache for adding nodes to a cluster. How do others deal with this? 1) If I am self-signing the client certificates (with puppetmaster), is it enough that the truststore just contain the CA certificate used to sign them? This is the typical PKI mechanism for verifying trust, so I am hoping it works here. 2) If not, can I use the same certificate for every node? If so, what is the downside? I'm mainly concerned with encryption over public internet links, not node identity verification.
Re: VPC AWS
Have a look at http://www.tinc-vpn.org/, mesh based and handles multiple gateways for the same network in a graceful manner (so you can run two gateways per region for HA). Also supports NAT traversal if you need to do public-private clusters. We are currently evaluating it for our managed Cassandra in a VPC solution, but we haven’t ever used it in a production environment or with a heavy load, so caveat emptor. As for the snitch… the GPFS is definitely the most flexible. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 10 Jun 2014, at 1:42 am, Ackerman, Mitchell mitchell.acker...@pgi.com wrote: Peter, I too am working on setting up a multi-region VPC Cassandra cluster. Each region is connected to each other via an OpenVPN tunnel, so we can use internal IP addresses for both the seeds and broadcast address. This allows us to use the EC2Snitch (my interpretation of the caveat that this snitch won’t work in a multi-region environment is that it won’t work if you can’t use internal IP addresses, which we can via the VPN tunnels). All the C* nodes find each other, and nodetool (or OpsCenter) shows that we have established a multi-datacenter cluster. Thus far, I’m not happy with the performance of the cluster in such a configuration, but I don’t think that it is related to this configuration, though it could be. Mitchell From: Peter Sanford [mailto:psanf...@retailnext.net] Sent: Monday, June 09, 2014 7:19 AM To: user@cassandra.apache.org Subject: Re: VPC AWS Your general assessments of the limitations of the Ec2 snitches seem to match what we've found. We're currently using the GossipingPropertyFileSnitch in our VPCs. This is also the snitch to use if you ever want to have a DC in EC2 and a DC with another hosting provider. -Peter On Mon, Jun 9, 2014 at 5:48 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi guys, there is a lot of answer, it looks like this subject is interesting a lot of people, so I will end up letting you know how it went for us. For now, we are still doing some tests. Yet I would like to know how we are supposed to configure Cassandra in this environment : - VPC - Multiple datacenters (should be VPCs, one per region, linked through VPN ?) - Cassandra 1.2 We are currently running under EC2MultiRegionSnitch, but with no VPC. Our VPC will have no public interface, so I am not sure how to configure broadcast address or seeds that are supposed to be the public IP of the node. I could use EC2Snitch, but will cross region work properly ? Should I use an other snitch ? Is someone using a similar configuration ? Thanks for information already given guys, we will achieve this ;-). 2014-06-07 0:05 GMT+02:00 Jonathan Haddad j...@jonhaddad.com: This may not help you with the migration, but it may with maintenance management. I just put up a blog post on managing VPC security groups with a tool I open sourced at my previous company. If you're going to have different VPCs (staging / prod), it might help with managing security groups. http://rustyrazorblade.com/2014/06/an-introduction-to-roadhouse/ Semi shameless plug... but relevant. On Thu, Jun 5, 2014 at 12:01 PM, Aiman Parvaiz ai...@shift.com wrote: Cool, thanks again for this. On Thu, Jun 5, 2014 at 11:51 AM, Michael Theroux mthero...@yahoo.com wrote: You can have a ring spread across EC2 and the public subnet of a VPC. That is how we did our migration. In our case, we simply replaced the existing EC2 node with a new instance in the public VPC, restored from a backup taken right before the switch. -Mike From: Aiman Parvaiz ai...@shift.com To: Michael Theroux mthero...@yahoo.com Cc: user@cassandra.apache.org user@cassandra.apache.org Sent: Thursday, June 5, 2014 2:39 PM Subject: Re: VPC AWS Thanks for this info Michael. As far as restoring node in public VPC is concerned I was thinking ( and I might be wrong here) if we can have a ring spread across EC2 and public subnet of a VPC, this way I can simply decommission nodes in Ec2 as I gradually introduce new nodes in public subnet of VPC and I will end up with a ring in public subnet and then migrate them from public to private in a similar way may be. If anyone has any experience/ suggestions with this please share, would really appreciate it. Aiman On Thu, Jun 5, 2014 at 10:37 AM, Michael Theroux mthero...@yahoo.com wrote: The implementation of moving from EC2 to a VPC was a bit of a juggling act. Our motivation was two fold: 1) We were running out of static IP addresses, and it was becoming increasingly difficult in EC2 to design around limiting the number of static IP addresses to the number of public IP addresses EC2 allowed 2) VPC affords us an additional level of security that was desirable. However, we needed to consider the following
Re: Minimum Cluster size to accommodate a single node failure
Yes your thinking is correct. This article from TLP sums it all up beautifully http://thelastpickle.com/blog/2011/06/13/Down-For-Me.html Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 18 Jun 2014, at 4:18 pm, Prabath Abeysekara prabathabeysek...@gmail.com wrote: Sorry, the title of this thread has to be Minimum cluster size to survive a single node failure. On Wed, Jun 18, 2014 at 11:38 AM, Prabath Abeysekara prabathabeysek...@gmail.com wrote: Hi Everyone, First of all, apologies if the $subject was discussed previously in this list before. I've already gone through quite a few email trails on this but still couldn't find a convincing answer which really made me raise this question again here in this list. If my understanding is correct, a 3 node Cassandra cluster would survive a single node failure while the Replication Factor is set to 3 with consistency levels are assigned QUORUM for read/write operations. For example, let's consider the following configuration. * Number of nodes in the cluster : 3 * Replication Factor : 3 * Read/Write consistencies : QUORUM (this evaluates to 2 when RF is set to 3) Here's how I expect it to work. Whenever a read operation takes place, the Cassandra cluster coordinator node that receives the read request would try to read from at least two replicas before responding to the client. With Read consistency being 2 (+ all raws being available in all three nodes), we should be able to survive a single node failure in this particular instance for read operations. Similarly, for write requests, even in the middle of a single node failure, the writes should be allowed as the Write consistency is set to 2? Can someone please confirm whether what's mentioned above is correct? (Please note that I'm trying to figure out the minimum node numbers and I indeed am aware of the fact that there are other factors also to be considered in order to come up with the most optimal numbers for a given cluster requirement). Cheers, Prabath -- Prabath -- Prabath
Re: EBS SSD - Cassandra ?
http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningEC2_c.html From the link: EBS volumes are not recommended for Cassandra data volumes for the following reasons: • EBS volumes contend directly for network throughput with standard packets. This means that EBS throughput is likely to fail if you saturate a network link. • EBS volumes have unreliable performance. I/O performance can be exceptionally slow, causing the system to back load reads and writes until the entire cluster becomes unresponsive. • Adding capacity by increasing the number of EBS volumes per host does not scale. You can easily surpass the ability of the system to keep effective buffer caches and concurrently serve requests for all of the data it is responsible for managing. Still applies, especially the network contention and latency issues. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 18 Jun 2014, at 7:18 pm, Daniel Chia danc...@coursera.org wrote: While they guarantee IOPS, they don't really make any guarantees about latency. Since EBS goes over the network, there's so many things in the path of getting at your data, I would be concerned with random latency spikes, unless proven otherwise. Thanks, Daniel On Wed, Jun 18, 2014 at 1:58 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: In this document it is said : Provisioned IOPS (SSD) - Volumes of this type are ideal for the most demanding I/O intensive, transactional workloads and large relational or NoSQL databases. This volume type provides the most consistent performance and allows you to provision the exact level of performance you need with the most predictable and consistent performance. With this type of volume you provision exactly what you need, and pay for what you provision. Once again, you can achieve up to 48,000 IOPS by connecting multiple volumes together using RAID. 2014-06-18 10:57 GMT+02:00 Alain RODRIGUEZ arodr...@gmail.com: Hi, I just saw this : http://aws.amazon.com/fr/blogs/aws/new-ssd-backed-elastic-block-storage/ Since the problem with EBS was the network, there is no chance that this hardware architecture might be useful alongside Cassandra, right ? Alain
Re: EBS SSD - Cassandra ?
Irrespective of performance and latency numbers there are fundamental flaws with using EBS/NAS and Cassandra, particularly around bandwidth contention and what happens when the shared storage medium breaks. Also obligatory reference to http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html. Regarding ENI AWS are pretty explicit about it’s impact on bandwidth: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html Attaching another network interface to an instance is not a method to increase or double the network bandwidth to or from the dual-homed instance. So Nate you are right in that it is a function of logical separation helps for some reason. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 20 Jun 2014, at 8:17 am, Nate McCall n...@thelastpickle.com wrote: Sorry - should have been clear I was speaking in terms of route optimizing, not bandwidth. No idea as to the implementation (probably instance specific) and I doubt it actually doubles bandwidth. Specifically: having an ENI dedicated to API traffic did smooth out some recent load tests we did for a client. It could be that overall throughput increases where more a function of cleaner traffic segmentation/smoother routing. We werent being terribly scientific - was more an artifact of testing network segmentation. I'm just going to say that using an ENI will make things better (since traffic segmentation is always good practice anyway :) YMMV. On Thu, Jun 19, 2014 at 3:39 PM, Russell Bradberry rbradbe...@gmail.com wrote: does an elastic network interface really use a different physical network interface? or is it just to give the ability for multiple ip addresses? On June 19, 2014 at 3:56:34 PM, Nate McCall (n...@thelastpickle.com) wrote: If someone really wanted to try this it, I recommend adding an Elastic Network Interface or two for gossip and client/API traffic. This lets EBS and management traffic have the pre-configured network. On Thu, Jun 19, 2014 at 6:54 AM, Benedict Elliott Smith belliottsm...@datastax.com wrote: I would say this is worth benchmarking before jumping to conclusions. The network being a bottleneck (or latency causing) for EBS is, to my knowledge, supposition, and instances can be started with direct connections to EBS if this is a concern. The blog post below shows that even without SSDs the EBS-optimised provisioned-IOPS instances show pretty consistent latency numbers, although those latencies are higher than you would typically expect from locally attached storage. http://blog.parse.com/2012/09/17/parse-databases-upgraded-to-amazon-provisioned-iops/ Note, I'm not endorsing the use of EBS. Cassandra is designed to scale up with number of nodes, not with depth of nodes (as Ben mentions, saturating a single node's data capacity is pretty easy these days. CPUs rapidly become the bottleneck as you try to go deep). However the argument that EBS cannot provide consistent performance seems overly pessimistic, and should probably be empirically determined for your use case. On Thu, Jun 19, 2014 at 9:50 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Ok, looks fair enough. Thanks guys. I would be great to be able to add disks when amount of data raises and add nodes when throughput increases... :) 2014-06-19 5:27 GMT+02:00 Ben Bromhead b...@instaclustr.com: http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningEC2_c.html From the link: EBS volumes are not recommended for Cassandra data volumes for the following reasons: • EBS volumes contend directly for network throughput with standard packets. This means that EBS throughput is likely to fail if you saturate a network link. • EBS volumes have unreliable performance. I/O performance can be exceptionally slow, causing the system to back load reads and writes until the entire cluster becomes unresponsive. • Adding capacity by increasing the number of EBS volumes per host does not scale. You can easily surpass the ability of the system to keep effective buffer caches and concurrently serve requests for all of the data it is responsible for managing. Still applies, especially the network contention and latency issues. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 18 Jun 2014, at 7:18 pm, Daniel Chia danc...@coursera.org wrote: While they guarantee IOPS, they don't really make any guarantees about latency. Since EBS goes over the network, there's so many things in the path of getting at your data, I would be concerned with random latency spikes, unless proven otherwise. Thanks, Daniel On Wed, Jun 18, 2014 at 1:58 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: In this document it is said : Provisioned IOPS (SSD) - Volumes of this type are ideal for the most demanding I/O
Re: possible to have TTL on individual collection values?
Create a table with a set as one of the columns using cqlsh, populate with a few records. Connect using the cassandra-cli, run list on your table/cf and you'll see how the sets work. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 13/07/2014, at 11:19 AM, Kevin Burton bur...@spinn3r.com wrote: On Sat, Jul 12, 2014 at 6:05 PM, Keith Wright kwri...@nanigans.com wrote: Yes each item in the set can have a different TTL so long as they are upserted with commands having differing TTLs. Ah… ok. So you can just insert them with unique UPDATE/INSERT commands with different USING TTLs and it will work. That makes sense. You should read about how collections/maps work in CQL3 in terms of their CQL2 structure. Definitely. I tried but the documentation is all over the map. This is one of the problems with Cassandra IMO. It's evolving so fast that it's difficult to find the correct documentation. -- Founder/CEO Spinn3r.com Location: San Francisco, CA blog: http://burtonator.wordpress.com … or check out my Google+ profile
Re: any plans for coprocessors?
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/trigger_r.html Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 26 Jul 2014, at 11:32 am, Kevin Burton bur...@spinn3r.com wrote: Are there any plans to add coprocessors to cassandra? Embedding logic directly in a cassandra daemon would be nice. -- Founder/CEO Spinn3r.com Location: San Francisco, CA blog: http://burtonator.wordpress.com … or check out my Google+ profile
Re: stalled nodetool repair?
https://github.com/mstump/cassandra_range_repair Also very useful. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 22/08/2014, at 6:12 AM, Robert Coli rc...@eventbrite.com wrote: On Thu, Aug 21, 2014 at 12:32 PM, Kevin Burton bur...@spinn3r.com wrote: How do I watch the progress of nodetool repair. This is a very longstanding operational problem in Cassandra. Repair barely works and is opaque, yet one is expected to run it once a week in the default configuration. An unreasonably-hostile-in-tone-but-otherwise-accurate description of the status quo before the re-write of streaming in 2.0 : https://issues.apache.org/jira/browse/CASSANDRA-5396 A proposal to change the default for gc_grace_seconds to 34 days, so that this fragile and heavyweight operation only has to be done once a month : https://issues.apache.org/jira/browse/CASSANDRA-5850 granted , this is a lot of data, but it would be nice to at least see some progress. Here's the rewrite of streaming, where progress indication improves dramatically over the prior status quo : https://issues.apache.org/jira/browse/CASSANDRA-5286 And here's two open tickets on making repair less opaque (thx yukim@#cassandra) : https://issues.apache.org/jira/browse/CASSANDRA-5483 https://issues.apache.org/jira/browse/CASSANDRA-5839 =Rob
Re: stalled nodetool repair?
Ah sorry that is the original repo, see https://github.com/BrianGallew/cassandra_range_repair for the updated version of the script with vnode support Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 22 Aug 2014, at 2:19 pm, DuyHai Doan doanduy...@gmail.com wrote: Thanks Ben for the link. Still this script does not work with vnodes, which exclude a wide range of C* config On Thu, Aug 21, 2014 at 5:51 PM, Ben Bromhead b...@instaclustr.com wrote: https://github.com/mstump/cassandra_range_repair Also very useful. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 22/08/2014, at 6:12 AM, Robert Coli rc...@eventbrite.com wrote: On Thu, Aug 21, 2014 at 12:32 PM, Kevin Burton bur...@spinn3r.com wrote: How do I watch the progress of nodetool repair. This is a very longstanding operational problem in Cassandra. Repair barely works and is opaque, yet one is expected to run it once a week in the default configuration. An unreasonably-hostile-in-tone-but-otherwise-accurate description of the status quo before the re-write of streaming in 2.0 : https://issues.apache.org/jira/browse/CASSANDRA-5396 A proposal to change the default for gc_grace_seconds to 34 days, so that this fragile and heavyweight operation only has to be done once a month : https://issues.apache.org/jira/browse/CASSANDRA-5850 granted , this is a lot of data, but it would be nice to at least see some progress. Here's the rewrite of streaming, where progress indication improves dramatically over the prior status quo : https://issues.apache.org/jira/browse/CASSANDRA-5286 And here's two open tickets on making repair less opaque (thx yukim@#cassandra) : https://issues.apache.org/jira/browse/CASSANDRA-5483 https://issues.apache.org/jira/browse/CASSANDRA-5839 =Rob
Re: Can't Add AWS Node due to /mnt/cassandra/data directory
Make sure you have also setup the ephemeral drives as a raid device (use mdadm) and mounted it under /mnt/cassandra otherwise your data dir is the os partition which is usually very small. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 27 Aug 2014, at 8:21 pm, Stephen Portanova sport...@gmail.com wrote: Worked great! Thanks Mark! On Wed, Aug 27, 2014 at 2:00 AM, Mark Reddy mark.l.re...@gmail.com wrote: Hi stephen, I have never added a node via OpsCenter, so this may be a short coming of that process. However in non OpsCenter installs you would have to create the data directories first: sudo mkdir -p /mnt/cassandra/commitlog sudo mkdir -p /mnt/cassandra/data sudo mkdir -p /mnt/cassandra/saved_caches And then give the cassandra user ownership of those directories: sudo chown -R cassandra:cassandra /mnt/cassandra Once this is done Cassandra will have the correct directories and permission to start up. Mark On 27 August 2014 09:50, Stephen Portanova sport...@gmail.com wrote: I already have a 3node m3.large DSE cluster, but I can't seem to add another m3.large node. I'm using the ubuntu-trusty-14.04-amd64-server-20140607.1 (ami-a7fdfee2) AMI (instance-store backed, PV) on AWS, I install java 7 and the JNA, then I go into opscenter to add a node. Things look good for 3 or 4 green circles, until I either get this error: Start Errored: Timed out waiting for Cassandra to start. or this error: Agent Connection Errored: Timed out waiting for agent to connect. I check the system.log and output.log, and they both say: INFO [main] 2014-08-27 08:17:24,642 CLibrary.java (line 121) JNA mlockall successful ERROR [main] 2014-08-27 08:17:24,644 CassandraDaemon.java (line 235) Directory /mnt/cassandra/data doesn't exist ERROR [main] 2014-08-27 08:17:24,645 CassandraDaemon.java (line 239) Has no permission to create /mnt/cassandra/data directory INFO [Thread-1] 2014-08-27 08:17:24,646 DseDaemon.java (line 477) DSE shutting down... ERROR [Thread-1] 2014-08-27 08:17:24,725 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-1,5,main] java.lang.AssertionError at org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1263) at com.datastax.bdp.gms.DseState.setActiveStatus(DseState.java:171) at com.datastax.bdp.server.DseDaemon.stop(DseDaemon.java:478) at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:384) My agent.log file says: Node is still provisioning, not attempting to determine ip. INFO [Initialization] 2014-08-27 08:40:57,848 Sleeping for 20s before trying to determine IP over JMX again INFO [Initialization] 2014-08-27 08:41:17,849 Node is still provisioning, not attempting to determine ip. INFO [Initialization] 2014-08-27 08:41:17,849 Sleeping for 20s before trying to determine IP over JMX again INFO [Initialization] 2014-08-27 08:41:37,849 Node is still provisioning, not attempting to determine ip. INFO [Initialization] 2014-08-27 08:41:37,850 Sleeping for 20s before trying to determine IP over JMX again INFO [Initialization] 2014-08-27 08:41:57,850 Node is still provisioning, not attempting to determine ip. I feel like I'm missing something easy with the mount, so if you could point me in the right direction, I would really appreciate it! -- Stephen Portanova (480) 495-2634 -- Stephen Portanova (480) 495-2634
Re: Heterogenous cluster and vnodes
Hey, I have a few of VM host (bare metal) machines with varying amounts of free hard drive space on them. For simplicity let’s say I have three machine like so: * Machine 1 - Harddrive 1: 150 GB available. * Machine 2: - Harddrive 1: 150 GB available. - Harddrive 2: 150 GB available. * Machine 3. - Harddrive 1: 150 GB available. I am setting up a Cassandra cluster between them and as I see it I have two options: 1. I set up one Cassandra node/VM per bare metal machine. I assign all free hard drive space to each Cassandra node and I balance the cluster using vnodes proportionally to the amount of free hard drive space (CPU/RAM is not going to be a bottle neck here). 2. I set up four VMs, each running a Cassandra node with equal amount of hard drive space and equal amount of vnodes. Machine 2 runs two VMs. This setup will potentially create a situation where if Machine 2 goes down you may lose two replicas. As the two VMs on Machine 2 might be replicas for the same key. General question: Is any of these preferable to the other? I understand 1) yields lower high-availability (since nodes are on the same hardware). Other way around (2 would be potentially lower availability)… Cassandra thinks two of the vm's are separate when they in fact rely on the same underlying machine. Question about alternative 1: With varying vnodes, can I always be sure that replicas are never put on the same virtual machine? Yes… mostly https://issues.apache.org/jira/browse/CASSANDRA-4123 Or is varying vnodes really only useful/recommended when migrating from machines with varying hardware (like mentioned in [1])? Changing the number of vnodes changes the portion of the ring a node is responsible for. You can use it to account for different types of hardware, you can also use it for creating awesome situations like hotspots if you aren't careful… ymmv. At the end of the day I would throw out the extra hard drive / not use it / put more hard drives in the other machines. Why? Hard drives are cheap and your time as an admin for the cluster isn't. If you do add more hard drives you can also split out the commit log etc onto different disks. I would take less problems over trying to draw every last scrap of performance out of the available hardware any day of the year. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359
Re: Question about EC2 and SSDs
On 5 Sep 2014, at 10:05 am, Steve Robenalt sroben...@highwire.org wrote: We are migrating a small cluster on AWS from instances based on spinning disks (using instance store) to SSD-backed instances and we're trying to pick the proper instance type. Some of the recommendations for spinning disks say to use different drives for log vs data partitions to avoid issues with seek delays and contention for the disk heads. Since SSDs don't have the same seek delays, is it still recommended to use 2 SSD drives? Or is one sufficient? As a side note, splitting the commit log and data dirs into different volumes doesn’t do a whole lot of good on AWS irrespective of whether you are on spinning disks or SSDs. Simply because the volumes presented to the vm may be on the same disk. Just raid the available volumes and be done with it. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359
Re: Moving Cassandra from EC2 Classic into VPC
On 8 Sep 2014, at 12:34 pm, Oleg Dulin oleg.du...@gmail.com wrote: Another idea I had was taking the ec2-snitch configuration and converting it into a Property file snitch. But I still don't understand how to perform this move since I need my newly created VPC instances to have public IPs -- something I would like to avoid. Off the top of my head something like this might work if you want a no downtime approach: Use the gossiping property file snitch in the VPC data centre. Use a public elastic ip for each node. Have the instances in the VPC join your existing cluster. Decommission old cluster. Change the advertised endpoint addresses afterwards to the private addresses for nodes in the VPC using the following: https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/ Once that is done, remove the elastic IPs from the instances.
Re: no change observed in read latency after switching from EBS to SSD storage
EBS vs local SSD in terms of latency you are using ms as your unit of measurement. If your query runs for 10s you will not notice anything. What is a few less ms for the life of a 10 second query. To reiterate what Rob said. The query is probably slow because of your use case / data model, not the underlying disk. On 17 September 2014 14:21, Tony Anecito adanec...@yahoo.com wrote: If you cached your tables or the database you may not see any difference at all. Regards, -Tony On Tuesday, September 16, 2014 6:36 PM, Mohammed Guller moham...@glassbeam.com wrote: Hi - We are running Cassandra 2.0.5 on AWS on m3.large instances. These instances were using EBS for storage (I know it is not recommended). We replaced the EBS storage with SSDs. However, we didn't see any change in read latency. A query that took 10 seconds when data was stored on EBS still takes 10 seconds even after we moved the data directory to SSD. It is a large query returning 200,000 CQL rows from a single partition. We are reading 3 columns from each row and the combined data in these three columns for each row is around 100 bytes. In other words, the raw data returned by the query is approximately 20MB. I was expecting at least 5-10 times reduction in read latency going from EBS to SSD, so I am puzzled why we are not seeing any change in performance. Does anyone have insight as to why we don't see any performance impact on the reads going from EBS to SSD? Thanks, Mohammed -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | +61 415 936 359
Re: Repair taking long time
use https://github.com/BrianGallew/cassandra_range_repair On 30 September 2014 05:24, Ken Hancock ken.hanc...@schange.com wrote: On Mon, Sep 29, 2014 at 2:29 PM, Robert Coli rc...@eventbrite.com wrote: As an aside, you just lose with vnodes and clusters of the size. I presume you plan to grow over appx 9 nodes per DC, in which case you probably do want vnodes enabled. I typically only see discussion on vnodes vs. non-vnodes, but it seems to me that might be more important to discuss the number of vnodes per node. A small cluster having 256 vnodes/node is unwise given some of the sequential operations that are still done. Even if operations were done in parallel, having a 256x increase in parallelization seems an equally bad choice. I've never seen any discussion on how many vnodes per node might be an appropriate answer based a planned cluster size -- does such a thing exist? Ken -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | +61 415 936 359
Re: best practice for waiting for schema changes to propagate
The system.peers table which is a copy of some gossip info the node has stored, including the schema version. You should query this and wait until all schema versions have converged. http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_sys_tab_cluster_t.html http://www.datastax.com/dev/blog/the-data-dictionary-in-cassandra-1-2 As ensuring that the driver keeps talking to the node you made the schema change on I would ask the drivers specific mailing list / IRC: - MAILING LIST: https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user - IRC: #datastax-drivers on irc.freenode.net http://freenode.net/ On 30 September 2014 10:16, Clint Kelly clint.ke...@gmail.com wrote: Hi all, I often have problems with code that I write that uses the DataStax Java driver to create / modify a keyspace or table and then soon after reads the metadata for the keyspace to verify that whatever changes I made the keyspace or table are complete. As an example, I may create a table called `myTableName` and then very soon after do something like: assert(session .getCluster() .getMetaData() .getKeyspace(myKeyspaceName) .getTable(myTableName) != null) I assume this fails sometimes because the default round-robin load balancing policy for the Java driver will send my create-table request to one node and the metadata read to another, and because it takes some time for the table creation to propagate across all of the nodes in my cluster. What is the best way to deal with this problem? Is there a standard way to wait for schema changes to propagate? Best regards, Clint -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | +61 415 936 359
Re: Experience with multihoming cassandra?
I'm guessing your talking about multi-homing because you want to have multiple tenants (different apps/ teams etc) to make better use of resources ? As Jared mentioned running multiple Cassandra processes on the same hardware that participate in the same cluster doesn't make much sense from a failure domain point of view (it could mess up how C* replicates with replicas for a key being potentially on the same physical server). As for splitting up a server for multi-tenancy purposes this then becomes a question of virtualisation as while there is some multi-tenant support in C* (auth, throttling per keyspace), it is fairly limited at best. There a whole range of options out there ranging from xen, vmware etc through to lightweight virtualisation like linux namespaces with cgroups, etc. I think Spotify run C* in production using namespaces with cgroup iirc and you could using something like docker to help manage this for you. Docker will also help with managing network addressing etc (the multi homed aspect). We've also had a lot of success running C* with docker (and previously SmartOS and solaris zones). Though you will be treading new / undocumented ground and thus expect to have to solve a few issues along the way. On 26 September 2014 04:32, Jared Biel jared.b...@bolderthinking.com wrote: Doing this seems counter-productive to Cassandra's design/use-cases. It's best at home running on a large number of smaller servers rather than a small number of large servers. Also, as you said, you won't get any of the high availability benefits that it offers if you run multiple copies of Cassandra on the same box. On 25 September 2014 16:58, Donald Smith donald.sm...@audiencescience.com wrote: We have large boxes with 256G of RAM and SSDs. From iostat, top, and sar we think the system has excess capacity. Anyone have recommendations about multihoming http://en.wikipedia.org/wiki/Multihoming cassandra on such a node (connecting it to multiple IPs and running multiple cassandras simultaneously)? I’m skeptical, since Cassandra already has built-in multi-threading and since if the node went down multiple nodes would disappear. We’re using C* version 2.0.9. A google/bing search for multihoming cassandra doesn’t turn much up. *Donald A. Smith* | Senior Software Engineer P: 425.201.3900 x 3866 C: (206) 819-5965 F: (646) 443-2333 dona...@audiencescience.com [image: AudienceScience] -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | +61 415 936 359
Re: DSE install interfering with apache Cassandra 2.1.0
check your cqlshrc file (sometimes in ~/.cassandra) ? I've been caught out before when playing with a RC of 2.1 On 30 September 2014 01:25, Andrew Cobley a.e.cob...@dundee.ac.uk wrote: Without the apache cassandra running I ran jps -l on this machine ,the only result was 338 sun.tool.jps.Jps The Mac didn’t like the netstat command so I ran netstat -atp tcp | grep 9160 no result Also for the native port: netstat-atp tcp | grep 9042 gave no result (command may be wrong) So I ran port scan using the network utility (between 0 and 1). Results as shown: Port Scan has started… Port Scanning host: 127.0.0.1 Open TCP Port: 631ipp Port Scan has completed… Hope this helps. Andy On 29 Sep 2014, at 15:09, Sumod Pawgi spa...@gmail.com wrote: Please run jps to check which Java services are still running and to make sure if c* is running. Then please check if 9160 port is in use. netstat -nltp | grep 9160 This will confirm what is happening in your case. Sent from my iPhone On 29-Sep-2014, at 7:15 pm, Andrew Cobley a.e.cob...@dundee.ac.uk wrote: Hi All, Just come across this one, I’m at a bit of a loss on how to fix it. A user here did the following steps On a MAC Install Datastax Enterprise (DSE) using the dmg file test he can connect using the DSE cqlsh window Unistall DSE (full uninstall which stops the services) download apache cassandra 2.1.0 unzip change to the non directory run sudo ./cassandra Now when he tries to connect using cqlsh from apache cassandra 2.1.0 bin he gets Connection error: ('Unable to connect to any servers', {'127.0.0.1': ConnectionShutdown('Connection AsyncoreConnection(4528514448) 127.0.0.1:9160 (closed) is already closed',)}) This is probably related to http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201409.mbox/%3CCALHCZd7RGSahJUbK32WoTr9JRoA+4K=mrfocmxuk0nbzoqq...@mail.gmail.com%3E but I can’t see why the uninstall of DSE is leaving the apache cassandra release cqlsh unable to attach to the apache cassandra runtime. Ta Andy The University of Dundee is a registered Scottish Charity, No: SC015096 The University of Dundee is a registered Scottish Charity, No: SC015096 -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | +61 415 936 359
Re: DSE install interfering with apache Cassandra 2.1.0
Only recently! Moving off list (c* users bcc'd). On 30 September 2014 19:20, Andrew Cobley a.e.cob...@dundee.ac.uk wrote: HI Ben, yeah, that was it, recovered from the Cassandra summit ? Andy On 30 Sep 2014, at 08:19, Ben Bromhead b...@instaclustr.com wrote: check your cqlshrc file (sometimes in ~/.cassandra) ? I've been caught out before when playing with a RC of 2.1 On 30 September 2014 01:25, Andrew Cobley a.e.cob...@dundee.ac.uk wrote: Without the apache cassandra running I ran jps -l on this machine ,the only result was 338 sun.tool.jps.Jps The Mac didn’t like the netstat command so I ran netstat -atp tcp | grep 9160 no result Also for the native port: netstat-atp tcp | grep 9042 gave no result (command may be wrong) So I ran port scan using the network utility (between 0 and 1). Results as shown: Port Scan has started… Port Scanning host: 127.0.0.1 Open TCP Port: 631ipp Port Scan has completed… Hope this helps. Andy On 29 Sep 2014, at 15:09, Sumod Pawgi spa...@gmail.com wrote: Please run jps to check which Java services are still running and to make sure if c* is running. Then please check if 9160 port is in use. netstat -nltp | grep 9160 This will confirm what is happening in your case. Sent from my iPhone On 29-Sep-2014, at 7:15 pm, Andrew Cobley a.e.cob...@dundee.ac.uk wrote: Hi All, Just come across this one, I’m at a bit of a loss on how to fix it. A user here did the following steps On a MAC Install Datastax Enterprise (DSE) using the dmg file test he can connect using the DSE cqlsh window Unistall DSE (full uninstall which stops the services) download apache cassandra 2.1.0 unzip change to the non directory run sudo ./cassandra Now when he tries to connect using cqlsh from apache cassandra 2.1.0 bin he gets Connection error: ('Unable to connect to any servers', {'127.0.0.1': ConnectionShutdown('Connection AsyncoreConnection(4528514448) 127.0.0.1:9160 (closed) is already closed',)}) This is probably related to http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201409.mbox/%3CCALHCZd7RGSahJUbK32WoTr9JRoA+4K=mrfocmxuk0nbzoqq...@mail.gmail.com%3E but I can’t see why the uninstall of DSE is leaving the apache cassandra release cqlsh unable to attach to the apache cassandra runtime. Ta Andy The University of Dundee is a registered Scottish Charity, No: SC015096 The University of Dundee is a registered Scottish Charity, No: SC015096 -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | +61 415 936 359 The University of Dundee is a registered Scottish Charity, No: SC015096 -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | +61 415 936 359
Re: How to enable client-to-node encrypt communication with Astyanax cassandra client
Haven't personally followed this but give it a go: http://lyubent.github.io/security/planetcassandra/2013/05/31/ssl-for-astyanax.html On 8 October 2014 20:46, Lu, Boying boying...@emc.com wrote: Hi, All, I’m trying to enable client-to-node encrypt communication in Cassandra (2.0.7) with Astyanax client library (version=1.56.48) I found the links about how to enable this feature: http://www.datastax.com/documentation/cassandra/2.0/cassandra/security/secureSSLClientToNode_t.html But this only says how to set up in the server side, but not the client side. Here is my configuration on the server side (in yaml): client_encryption_options: enabled: true keystore: full-path-to-keystore-file *#same file used by Cassandra server* keystore_password: some-password truststore: fullpath-to-truststore-file *#same file used by Cassandra server* truststore_password: some-password # More advanced defaults below: # protocol: TLS # algorithm: SunX509 # store_type: JKS cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA] require_client_auth: true http://www.datastax.com/dev/blog/accessing-secure-dse-clusters-with-cql-native-protocol This link says something about client side, but not how to do it with the Astyanax client library. Searching the Astyanax source codes, I found the class SSLConnectionContext maybe useful And here is my code snippet: AstyanaxContextCluster clusterContext = new AstyanaxContext.Builder() .forCluster(clusterName) .forKeyspace(keyspaceName) .withAstyanaxConfiguration(new AstyanaxConfigurationImpl() .setRetryPolicy(new QueryRetryPolicy(10, 1000))) .withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl(_clusterName) .setMaxConnsPerHost(1) .setAuthenticationCredentials(credentials) *.setSSLConnectionContext(sslContext)* .setSeeds(String.format(%1$s:%2$d, uri.getHost(), uri.getPort())) ) .buildCluster(ThriftFamilyFactory.getInstance()); But when I tried to connect to the Cassandra server, I got following error: Caused by: org.apache.thrift.transport.TTransportException: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161) at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:158) at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65) at org.apache.cassandra.thrift.Cassandra$Client.send_login(Cassandra.java:567) at org.apache.cassandra.thrift.Cassandra$Client.login(Cassandra.java:559) at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.open(ThriftSyncConnectionFactoryImpl.java:203) ... 6 more It looks like that my SSL settings are incorrect. Does anyone know how to resolve this issue? Thanks Boying -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | +61 415 936 359
Re: Error: No module named cql
It looks like easy_install is using python2.6 and installing cql in the 2.6 packages directory: /usr/lib/python2.6/site-packages/ cqlsh is using the python executable for you environment (which looks like 2.7) and thus is looking for cql in the site packages dir (amongst others). To quickly install the cql module explicitly for python 2.7 run: python2.7 -m easy_install cql Though you might also want to sort out your easy_install so it matches the version of python that is used by default. On 15 October 2014 11:48, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I'm using cassandra 2.1.0 on CentOS 6.5 And when I try to run cqlsh on the command line I get this error: root@beta-new:~] #cqlsh Python CQL driver not installed, or not on PYTHONPATH. You might try easy_install cql. Python: /usr/local/bin/python Module load path: ['/usr/local/apache-cassandra-2.1.0/bin', '/usr/local/lib/python27.zip', '/usr/local/lib/python2.7', '/usr/local/lib/python2.7/plat-linux2', '/usr/local/lib/python2.7/lib-tk', '/usr/local/lib/python2.7/lib-old', '/usr/local/lib/python2.7/lib-dynload', '/root/.local/lib/python2.7/site-packages', '/usr/local/lib/python2.7/site-packages'] Error: No module named cql I tried following the advice from the error and ran that command: [root@beta-new:~] #easy_install cql Searching for cql Best match: cql 1.4.0 Processing cql-1.4.0-py2.6.egg cql 1.4.0 is already the active version in easy-install.pth Using /usr/lib/python2.6/site-packages/cql-1.4.0-py2.6.egg Processing dependencies for cql Finished processing dependencies for cql And that seems to go ok! However when I try to run it again: [root@beta-new:~] #cqlsh Python CQL driver not installed, or not on PYTHONPATH. You might try easy_install cql. Python: /usr/local/bin/python Module load path: ['/usr/local/apache-cassandra-2.1.0/bin', '/usr/local/lib/python27.zip', '/usr/local/lib/python2.7', '/usr/local/lib/python2.7/plat-linux2', '/usr/local/lib/python2.7/lib-tk', '/usr/local/lib/python2.7/lib-old', '/usr/local/lib/python2.7/lib-dynload', '/root/.local/lib/python2.7/site-packages', '/usr/local/lib/python2.7/site-packages'] Error: No module named cql I get the same exact error. How on earth do I break out of this feeback loop? Thanks! Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359
Re: Dependency Hell: STORM 0.9.2 and Cassandra 2.0
I haven't had to deal with this problem specifically and don't know if there is a storm specific solution, but the general Java way of dealing with projects who have conflicting dependencies would be to either exclude one of the conflicting dependencies using maven and see if it works, otherwise rename the conflicting dependency using http://maven.apache.org/plugins/maven-dependency-plugin/usage.html so both projects can use there own versions of guava without the package names conflicting (and the jvm will load the correct classes for each dep). On 25 October 2014 06:13, Gary Zhao garyz...@gmail.com wrote: Hello Anyone encountered the following issue and any workaround? Our Storm topology was written in Clojure. Our team is upgrading one of our storm topology from using cassandra 1.2 to cassandra 2.0, and we have found one problem that is difficult to tackle. Cassandra 2.0Java driver requires google guava 1.6. Unfortuanately, storm 0.9.2 provides a lower version. Because of that, a topology will not be able to contact Cassandra databases. Thanks Gary -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | +61 415 936 359
Re: Repair/Compaction Completion Confirmation
https://github.com/BrianGallew/cassandra_range_repair This breaks down the repair operation into very small portions of the ring as a way to try and work around the current fragile nature of repair. Leveraging range repair should go some way towards automating repair (this is how the automatic repair service in DataStax opscenter works, this is how we perform repairs). We have had a lot of success running repairs in a similar manner against vnode enabled clusters. Not 100% bullet proof, but way better than nodetool repair On 28 October 2014 08:32, Tim Heckman t...@pagerduty.com wrote: On Mon, Oct 27, 2014 at 1:44 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Oct 27, 2014 at 1:33 PM, Tim Heckman t...@pagerduty.com wrote: I know that when issuing some operations via nodetool, the command blocks until the operation is finished. However, is there a way to reliably determine whether or not the operation has finished without monitoring that invocation of nodetool? In other words, when I run 'nodetool repair' what is the best way to reliably determine that the repair is finished without running something equivalent to a 'pgrep' against the command I invoked? I am curious about trying to do the same for major compactions too. This is beyond a FAQ at this point, unfortunately; non-incremental repair is awkward to deal with and probably impossible to automate. In The Future [1] the correct solution will be to use incremental repair, which mitigates but does not solve this challenge entirely. As brief meta commentary, it would have been nice if the project had spent more time optimizing the operability of the critically important thing you must do once a week [2]. https://issues.apache.org/jira/browse/CASSANDRA-5483 =Rob [1] http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1 [2] Or, more sensibly, once a month with gc_grace_seconds set to 34 days. Thank you for getting back to me so quickly. Not the answer that I was secretly hoping for, but it is nice to have confirmation. :) Cheers! -Tim -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | +61 415 936 359
Re: bootstrapping manually when auto_bootstrap=false ?
- In Cassandra yaml set auto_bootstrap = false - Boot node - nodetool rebuild Very similar to http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html On 18 December 2014 at 14:04, Kevin Burton bur...@spinn3r.com wrote: I’m trying to figure out the best way to bootstrap our nodes. I *think* I want our nodes to be manually bootstrapped. This way an admin has to explicitly bring up the node in the cluster and I don’t have to worry about a script accidentally provisioning new nodes. The problem is HOW do you do it? I couldn’t find any reference anywhere in the documentation. I *think* I run nodetool repair? but it’s unclear.. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | +61 415 936 359
Re: simple data movement ?
Just copy the data directory from each prod node to your test node (and relevant configuration files etc). If your IP addresses are different between test and prod, follow https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/ On 18 December 2014 at 09:10, Langston, Jim jim.langs...@dynatrace.com wrote: Hi all, I have set up a test environment with C* 2.1.2, wanting to test our applications against it. I currently have C* 1.2.9 in production and want to use that data for testing. What would be a good approach for simply taking a copy of the production data and moving it into the test env and having the test env C* use that data ? The test env. is identical is size, with the difference being the versions of C*. Thanks, Jim The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | +61 415 936 359
Re: Deleted snapshot files filling up /var/lib/cassandra
If you are running a sequential repair (or have previously run a sequential repair that is still running) Cassandra will still have the file descriptors open for files in the snapshot it is using for the repair operation. From the http://www.datastax.com/dev/blog/repair-in-cassandra *Cassandra 1.2 introduced a new option to repair to help manage the problems caused by the nodes all repairing with each other at the same time, it is call a snapshot repair, or sequential repair. As of Cassandra 2.1, sequential repair is the default, and the old parallel repair an option. Sequential repair has all of the nodes involved take a snapshot, the snapshot lives until the repair finishes, and then is removed. By taking a snapshot, repair can procede in a serial fashion, such that only two nodes are ever comparing with each other at a time. This makes the overall repair process slower, but decreases the burden placed on the nodes, and means you have less impact on reads/writes to the system.* On 16 March 2015 at 16:33, David Wahler dwah...@indeed.com wrote: On Mon, Mar 16, 2015 at 6:12 PM, Ben Bromhead b...@instaclustr.com wrote: Cassandra will by default snapshot your data directory on the following events: TRUNCATE and DROP schema events when you run nodetool repair when you run nodetool snapshot Snapshots are just hardlinks to existing SSTables so the only disk space they take up is for files that have since been compacted away. Disk space for snapshots will be freed when the last link to the files are removed. You can remove all snapshots in a cluster using nodetool clearsnapshot Snapshots will fail if you are out of disk space (this is counterintuitive to the above, but it is true), if you have not increased the number of available file descriptors or if there are permissions issues. Out of curiosity, how often are you running repair? Thanks for the information. We're running repair once per week, as recommended by the Datastax documentation. The repair is staggered to run on one machine at a time with the --partitioner-range option in order to spread out the load. Running nodetool clearsnapshot doesn't free up any space. I'm guessing that because the snapshot files have been deleted from the filesystem, Cassandra thinks the snapshots are already gone. But because it still has the file descriptors open, the disk space hasn't actually been reclaimed. -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: Deleted snapshot files filling up /var/lib/cassandra
Cassandra will by default snapshot your data directory on the following events: - TRUNCATE and DROP schema events - when you run nodetool repair - when you run nodetool snapshot Snapshots are just hardlinks to existing SSTables so the only disk space they take up is for files that have since been compacted away. Disk space for snapshots will be freed when the last link to the files are removed. You can remove all snapshots in a cluster using nodetool clearsnapshot Snapshots will fail if you are out of disk space (this is counterintuitive to the above, but it is true), if you have not increased the number of available file descriptors or if there are permissions issues. Out of curiosity, how often are you running repair? On 16 March 2015 at 15:52, David Wahler dwah...@indeed.com wrote: On Mon, Mar 16, 2015 at 5:28 PM, Jan cne...@yahoo.com wrote: David; all the packaged installations use the /var/lib/cassandra directory. Could you check your yaml config files and see if you are using this default directory for backups May want to change it to a location with more disk space. We're using the default /var/lib/cassandra as our data directory, mounted as its own LVM volume. I don't see anything in cassandra.yaml about a backup directory. There is an incremental_backups option which is set to false. Increasing the available disk space doesn't really seem like a solution. We have only about 450MB of live data on the most heavily-loaded server, and the space taken up by these deleted files is growing by several GB per day. For now we can work around the problem by periodically restarting servers to close the file handles, but that hurts our availability and seems like a hack. -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: Deleted snapshot files filling up /var/lib/cassandra
Sorry for the late reply. To immediately solve the problem you can restart Cassandra and all the open file descriptors to the deleted snapshots should disappear. As for why it happened I would first address the disk space issue and see if the snapshot errors + open file descriptors issue still occurs (I am unclear as to whether you got the snapshot exception after the disk filled up or before), if you still have issues with repair not letting go of snapshotted files even with free disk space I would look to raise a ticket in Jira. On 17 March 2015 at 12:46, David Wahler dwah...@indeed.com wrote: On Mon, Mar 16, 2015 at 6:51 PM, Ben Bromhead b...@instaclustr.com wrote: If you are running a sequential repair (or have previously run a sequential repair that is still running) Cassandra will still have the file descriptors open for files in the snapshot it is using for the repair operation. Yeah, that aligns with my understanding of how the repair process works. But the cluster has no repair sessions active (I think; when I run nodetool tpstats, the AntiEntropyStage and AntiEntropySessions values are zero on all nodes) and the space still hasn't been freed. -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: how to clear data from disk
To clarify on why this behaviour occurs, by default Cassandra will snapshot a table when you perform any destructive action (TRUNCATE, DROP etc) see http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/truncate_r.html To free disk space after such an operation you will always need to clear the snapshots (using either of above suggested methods). Unfortunately this can be a bit painful if you are rotating your tables, say by month, and want to remove the oldest one from disk as your client will need to speak JMX as well. You can disable this behaviour through the use of auto_snapshot in cassandra.yaml. Though I would strongly recommend leaving this feature enabled in any sane production environment and cleaning up snapshots as an independent task!! On 10 March 2015 at 20:43, Patrick McFadin pmcfa...@gmail.com wrote: Or just manually delete the files. The directories are broken down by keyspace and table. Patrick On Mon, Mar 9, 2015 at 7:50 PM, 曹志富 cao.zh...@gmail.com wrote: nodetool clearsnapshot -- Ranger Tsao 2015-03-10 10:47 GMT+08:00 鄢来琼 laiqiong@gtafe.com: Hi ALL, After drop table, I found the data is not removed from disk, I should reduce the gc_grace_seconds before the drop operation. I have to wait for 10 days, but there is not enough disk. Could you tell me there is method to clear the data from disk quickly? Thank you very much! Peter -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: High latencies for simple queries
cqlsh runs on the internal cassandra python drivers: cassandra-pylib and cqlshlib. I would not recommend using them at all (nothing wrong with them, they are just not built with external users in mind). I have never used python-driver in anger so I can't comment on whether it is genuinely slower than the internal C* python driver, but this might be a question for python-driver folk. On 28 March 2015 at 00:34, Artur Siekielski a...@vhex.net wrote: On 03/28/2015 12:13 AM, Ben Bromhead wrote: One other thing to keep in mind / check is that doing these tests locally the cassandra driver will connect using the network stack, whereas postgres supports local connections over a unix domain socket (this is also enabled by default). Unix domain sockets are significantly faster than tcp as you don't have a network stack to traverse. I think any driver using libpq will attempt to use the domain socket when connecting locally. Good catch. I assured that psycopg2 connects through a TCP socket and the numbers increased by about 20%, but it still is an order of magnitude faster than Cassandra. But I'm going to hazard a guess something else is going on with the Cassandra connection as I'm able to get 0.5ms queries locally and that's even with trace turned on. Using python-driver? -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: run cassandra on a small instance
-- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: High latencies for simple queries
Latency can be so variable even when testing things locally. I quickly fired up postgres and did the following with psql: ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i)); CREATE TABLE ben=# \timing Timing is on. ben=# INSERT INTO foo VALUES(2, 'yay'); INSERT 0 1 Time: 1.162 ms ben=# INSERT INTO foo VALUES(3, 'yay'); INSERT 0 1 Time: 1.108 ms I then fired up a local copy of Cassandra (2.0.12) cqlsh CREATE KEYSPACE foo WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; cqlsh USE foo; cqlsh:foo CREATE TABLE foo(i int PRIMARY KEY, j text); cqlsh:foo TRACING ON; Now tracing requests. cqlsh:foo INSERT INTO foo (i, j) VALUES (1, 'yay'); Tracing session: 7a7dced0-d4b2-11e4-b950-85c3c9bd91a0 activity | timestamp| source | source_elapsed ---+--+---+ execute_cql3_query | 11:52:55,229 | 127.0.0.1 | 0 Parsing INSERT INTO foo (i, j) VALUES (1, 'yay'); | 11:52:55,229 | 127.0.0.1 | 43 Preparing statement | 11:52:55,229 | 127.0.0.1 |141 Determining replicas for mutation | 11:52:55,229 | 127.0.0.1 |291 Acquiring switchLock read lock | 11:52:55,229 | 127.0.0.1 |403 Appending to commitlog | 11:52:55,229 | 127.0.0.1 |413 Adding to foo memtable | 11:52:55,229 | 127.0.0.1 |432 Request complete | 11:52:55,229 | 127.0.0.1 |541 All this on a mac book pro with 16gb of memory and an SSD So ymmv? On 27 March 2015 at 08:28, Tyler Hobbs ty...@datastax.com wrote: Just to check, are you concerned about minimizing that latency or maximizing throughput? I'll that latency is what you're actually concerned about. A fair amount of that latency is probably happening in the python driver. Although it can easily execute ~8k operations per second (using cpython), in some scenarios it can be difficult to guarantee sub-ms latency for an individual query due to how some of the internals work. In particular, it uses python's Conditions for cross-thread signalling (from the event loop thread to the application thread). Unfortunately, python's Condition implementation includes a loop with a minimum sleep of 1ms if the Condition isn't already set when you start the wait() call. This is why, with a single application thread, you will typically see a minimum of 1ms latency. Another source of similar latencies for the python driver is the Asyncore event loop, which is used when libev isn't available. I would make sure that you can use the LibevConnection class with the driver to avoid this. On Fri, Mar 27, 2015 at 6:24 AM, Artur Siekielski a...@vhex.net wrote: I'm running Cassandra locally and I see that the execution time for the simplest queries is 1-2 milliseconds. By a simple query I mean either INSERT or SELECT from a small table with short keys. While this number is not high, it's about 10-20 times slower than Postgresql (even if INSERTs are wrapped in transactions). I know that the nature of Cassandra compared to Postgresql is different, but for some scenarios this difference can matter. The question is: is it normal for Cassandra to have a minimum latency of 1 millisecond? I'm using Cassandra 2.1.2, python-driver. -- Tyler Hobbs DataStax http://datastax.com/ -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: Arbitrary nested tree hierarchy data model
+1 would love to see how you do it On 27 March 2015 at 07:18, Jonathan Haddad j...@jonhaddad.com wrote: I'd be interested to see that data model. I think the entire list would benefit! On Thu, Mar 26, 2015 at 8:16 PM Robert Wille rwi...@fold3.com wrote: I have a cluster which stores tree structures. I keep several hundred unrelated trees. The largest has about 180 million nodes, and the smallest has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in practice is probably less than 10. I am able to page through children and siblings. It works really well. Doesn’t sound like its exactly like what you’re looking for, but if you want any pointers on how I went about implementing mine, I’d be happy to share. On Mar 26, 2015, at 3:05 PM, List l...@airstreamcomm.net wrote: Not sure if this is the right place to ask, but we are trying to model a user-generated tree hierarchy in which they create child objects of a root node, and can create an arbitrary number of children (and children of children, and on and on). So far we have looked at storing each tree structure as a single document in JSON format and reading/writing it out in it's entirety, doing materialized paths where we store the root id with every child and the tree structure above the child as a map, and some form of an adjacency list (which does not appear to be very viable as looking up the entire tree would be ridiculous). The hope is to end up with a data model that allows us to display the entire tree quickly, as well as see the entire path to a leaf when selecting that leaf. If anyone has some suggestions/experience on how to model such a tree heirarchy we would greatly appreciate your input. -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: Really high read latency
wrote: Also, two control questions: - Are you using EBS for data storage? It might introduce additional latencies. - Are you doing proper paging when querying the keyspace? Cheers, Jens On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith david92galbra...@gmail.com wrote: Hi! So I've got a table like this: CREATE TABLE default.metrics (row_time int,attrs varchar,offset int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'} AND compression={'sstable_compression':'LZ4Compressor'}; and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 GB of heap space. So it's timeseries data that I'm doing so I increment row_time each day, attrs is additional identifying information about each series, and offset is the number of milliseconds into the day for each data point. So for the past 5 days, I've been inserting 3k points/second distributed across 100k distinct attrses. And now when I try to run queries on this data that look like SELECT * FROM default.metrics WHERE row_time = 5 AND attrs = 'potatoes_and_jam' it takes an absurdly long time and sometimes just times out. I did nodetool cftsats default and here's what I get: Keyspace: default Read Count: 59 Read Latency: 397.12523728813557 ms. Write Count: 155128 Write Latency: 0.3675690719921613 ms. Pending Flushes: 0 Table: metrics SSTable count: 26 Space used (live): 35146349027 Space used (total): 35146349027 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.10386468749216264 Memtable cell count: 141800 Memtable data size: 31071290 Memtable switch count: 41 Local read count: 59 Local read latency: 397.126 ms Local write count: 155128 Local write latency: 0.368 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 2856 Compacted partition minimum bytes: 104 Compacted partition maximum bytes: 36904729268 Compacted partition mean bytes: 986530969 Average live cells per slice (last five minutes): 501.66101694915255 Maximum live cells per slice (last five minutes): 502.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Ouch! 400ms of read latency, orders of magnitude higher than it has any right to be. How could this have happened? Is there something fundamentally broken about my data model? Thanks! -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: Java 8
DSE 4.6.5 supports Java 8 ( http://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/RNdse46.html?scroll=RNdse46__rel465) and DSE 4.6.5 is Cassandra 2.0.14 under the hood. I would go with 8 On 7 May 2015 at 04:51, Paulo Motta pauloricard...@gmail.com wrote: First link was broken (sorry), here is the correct link: http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREJNAabout_c.html 2015-05-07 8:49 GMT-03:00 Paulo Motta pauloricard...@gmail.com: The official recommendation is to run with Java7 ( http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREabout_c.html), mostly to play it safe I guess, however you can probably already run C* with Java8, since it has been stable for a while. We've been running with Java8 for several months now without any noticeable problem. Regarding source compatibility, the official plan is compile with Java8 starting from version 3.0. You may find more information on this ticket: https://issues.apache.org/jira/browse/CASSANDRA-8168 https://issues.apache.org/jira/browse/CASSANDRA-8168 2015-05-07 8:32 GMT-03:00 Stefan Podkowinski stefan.podkowin...@1und1.de : Hi Are there any plans to support Java 8 for Cassandra 2.0, now that Java 7 is EOL? Currently Java 7 is also recommended for 2.1. Are there any reasons not to recommend Java 8 for 2.1? Thanks, Stefan -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: Data migration
Use SSTableloader it comes with Cassandra and is designed for moving data between clusters and is far simpler than sqoop. it should even work with a schema change like you described (changing columns). It would probably/definitely break if you were dropping tables. Mind you I've never tried sstableloader while schema changes were occurring so happy to be wrong. On 14 April 2015 at 05:40, Prem Yadav ipremya...@gmail.com wrote: Look into sqoop. I believe using sqoop you can transfer data between C* clusters. I haven't tested it though. other option is to write a program to read from one cluster and write the required data to another. On Tue, Apr 14, 2015 at 12:27 PM, skrynnikov_m skrinniko...@epsysoft.com.ua wrote: Hello!!! Need to migrate data from one C* cluster to another periodically. During migration schema can change(add or remove one, two fields). Could you please suggest some tool? -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: Spark SQL JDBC Server + DSE
at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. *From: *Mohammed Guller moham...@glassbeam.com *Reply-To: *user@cassandra.apache.org *Date: *Thursday, May 28, 2015 at 8:26 PM *To: *user@cassandra.apache.org user@cassandra.apache.org *Subject: *RE: Spark SQL JDBC Server + DSE Anybody out there using DSE + Spark SQL JDBC server? Mohammed *From:* Mohammed Guller [mailto:moham...@glassbeam.com moham...@glassbeam.com] *Sent:* Tuesday, May 26, 2015 6:17 PM *To:* user@cassandra.apache.org *Subject:* Spark SQL JDBC Server + DSE Hi – As I understand, the Spark SQL Thrift/JDBC server cannot be used with the open source C*. Only DSE supports the Spark SQL JDBC server. We would like to find out whether how many organizations are using this combination. If you do use DSE + Spark SQL JDBC server, it would be great if you could share your experience. For example, what kind of issues you have run into? How is the performance? What reporting tools you are using? Thank you! Mohammed -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: Deploying OpsCenter behind a HTTP(S) proxy
OpsCenter is a little bit tricky to simply just rewrite urls, the lhr requests and rest endpoints it hits are all specified a little differently in the javascript app it loads. We ended up monkey patching a buttload of the js files to get all the requests working properly with our proxy. Everytime a new release of OpsCenter comes out we have to rework it. If you are a DSE customer I would raise it as a support issue :) On 18 June 2015 at 02:29, Spencer Brown lilspe...@gmail.com wrote: First, your firewall should really be your frontend There operational frontend is apache, which is common. You want every url with opscenter in it handled elsewhere. You could also set up proxies for /. cluster-configs, etc... Then there is mod_rewrite, which provides a lot more granularity about when you want what gets handled where.I set up the architectural infrastructure for Orbitz and some major banks, and I'd be happpy to help you out on this. I charge $30/hr., but what you need isn't very complex so we're really just talking $100. On Thu, Jun 18, 2015 at 5:13 AM, Jonathan Ballet jbal...@gfproducts.ch wrote: Hi, I'm looking for information on how to correctly deploy an OpsCenter instance behind a HTTP(S) proxy. I have a running instance of OpsCenter 5.1 reachable at http://opscenter:/opscenter/ but I would like to be able to serve this kind of tool under a single hostname on HTTPS along with other tools of this kind, for easier convenience. I'm currently using Apache as my HTTP front-end and I tried this naive configuration: VirtualHost *:80 ServerName tools ... ProxyPreserveHost On # Proxy to OpsCenter # ProxyPass /opscenter/ http://opscenter:/opscenter/ ProxyPassReverse/opscenter/ http://opscenter:/opscenter/ /VirtualHost This doesn't quite work, as OpsCenter seem to also serve specific endpoints from / directly Of course, it doesn't correctly work, as OpsCenter seem to also serve specific data from / directly, such as: /cluster-configs /TestCluster /meta /rc /tcp Is there something I can configure in OpsCenter so that it serves these URLs from somewhere else, or a list of known URLs that I can remap on the proxy, or better yet, a known proxy configuration to put in front of OpsCenter? Regards, Jonathan -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: Lucene index plugin for Apache Cassandra
Looks awesome, do you have any examples/benchmarks of using these indexes for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+? On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote: Hi all, With the release of Cassandra 2.1.6, Stratio is glad to present its open source Lucene-based implementation of C* secondary indexes https://github.com/Stratio/cassandra-lucene-index as a plugin that can be attached to Apache Cassandra. Before the above changes, Lucene index was distributed inside a fork of Apache Cassandra, with all the difficulties implied. As of now, the fork is discontinued and new users should use the recently created plugin, which maintains all the features of Stratio Cassandra https://github.com/Stratio/stratio-cassandra. Stratio's Lucene index extends Cassandra’s functionality to provide near real-time distributed search engine capabilities such as with ElasticSearch or Solr, including full text search capabilities, free multivariable search, relevance queries and field-based sorting. Each node indexes its own data, so high availability and scalability is guaranteed. We hope this will be useful to the Apache Cassandra community. Regards, -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD* -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: check active queries on cluster
A warning on enabling debug and trace logging on the write path. You will be writing information about every query to disk. If you have any significant volume of requests going through the nodes things will get slow pretty quickly. At least with C* 2.1 and using the default logging config. On 1 June 2015 at 07:34, Sebastian Martinka sebastian.marti...@mercateo.com wrote: You could enable DEBUG logging for org.apache.cassandra.transport.Message and TRACE logging for org.apache.cassandra.cql3.QueryProcessor in the log4j-server.properties file: log4j.logger.org.apache.cassandra.transport.Message=DEBUG log4j.logger.org.apache.cassandra.cql3.QueryProcessor=TRACE Afterwards you get the following output from all PreparedStatements in the system.log file: DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,186 Message.java (line 302) Received: PREPARE INSERT INTO dba_test.cust_view (leid, vid, geoarea, ver) VALUES (?, ?, ?, ?);, v=2 TRACE [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 QueryProcessor.java (line 283) Stored prepared statement 61956319a6d7c84c25414c96edf6e38c with 4 bind markers DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 Tracing.java (line 159) request complete DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 Message.java (line 309) Responding: RESULT PREPARED 61956319a6d7c84c25414c96edf6e38c [leid(dba_test, cust_view), org.apache.cassandra.db.marshal.UTF8Type][vid(dba_test, cust_view), org.apache.cassandra.db.marshal.UTF8Type][geoarea(dba_test, cust_view), org.apache.cassandra.db.marshal.UTF8Type][ver(dba_test, cust_view), org.apache.cassandra.db.marshal.LongType] (resultMetadata=[0 columns]), v=2 *Von:* Robert Coli [mailto:rc...@eventbrite.com] *Gesendet:* Freitag, 17. April 2015 19:23 *An:* user@cassandra.apache.org *Betreff:* Re: check active queries on cluster On Thu, Apr 16, 2015 at 11:10 PM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: We want to track active queries on cassandra cluster. Is there any tool or way to find all active queries on cassandra ? You can get a count of them with : https://issues.apache.org/jira/browse/CASSANDRA-5084 =Rob -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: Multiple cassandra instances per physical node
@Sean - You can manually change the ports used by Datastax agent using the address.yaml file in the agent install directory. +1 on using racks to separate it out... but it will increase operational complexity somewhat On 26 May 2015 at 08:11, Nate McCall n...@thelastpickle.com wrote: If you're running multiple nodes on a single server, vnodes give you no control over which instance has which key (whereas you can assign initial tokens). Therefore you could have two of your three replicas on the same physical server which, if it goes down, you can't read or write at quorum. Yep. You *will* have overlapping ranges on each physical server so long as Vnodes 'number of nodes in the cluster'. However, can't you use the topology snitch to put both nodes in the same rack? Won't that prevent the issue and still allow you to maintain quorum if a single server goes down? If I have a 20-node cluster with 2 nodes on each physical server, can I use 10 racks to properly segment my partitions? That's a good point, yes. I'd still personally prefer the operational simplicity of simply spacing out token assignments though, but YMMV. -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: Back to the futex()? :(
TfRa> are my JVM >> args. I realized I neglected to adjust memtable_flush_writers as I was >> writing this--so I'll get on that. Aside from that, I'm not sure what to >> do. (Thanks, again, for reading.) >> >> * They were batched for consistency--I'm hoping to return to using them >> when I'm back at normal load, which is tiny compared to backloading, but >> the impact on performance was eye-opening. >> ___ >> Will Hayworth >> Developer, Engagement Engine >> Atlassian >> >> My pronoun is "they". <http://pronoun.is/they> >> >> >> > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Re : Possibility of using 2 different snitches in the Multi_DC cluster
Also you may want to run multiple data centres in the one AWS region (load segmentation, spark etc). +1 GPFS for everything On Wed, 3 Feb 2016 at 07:42 sai krishnam raju potturi <pskraj...@gmail.com> wrote: > thanks a lot Robert. Greatly appreciate it. > > thanks > Sai > > On Tue, Feb 2, 2016 at 6:19 PM, Robert Coli <rc...@eventbrite.com> wrote: > >> On Tue, Feb 2, 2016 at 1:23 PM, sai krishnam raju potturi < >> pskraj...@gmail.com> wrote: >> >>> What is the possibility of using GossipingPropertFileSnitch on >>> datacenters in our private cloud, and Ec2MultiRegionSnitch in AWS? >>> >> >> You should just use GPFS everywhere. >> >> This is also the reason why you should not use EC2MRS if you might ever >> have a DC that is outside of AWS. Just use GPFS. >> >> =Rob >> PS - To answer your actual question... one "can" use different snitches >> on a per node basis, but ONE REALLY REALLY SHOULDN'T CONSIDER THIS A VALID >> APPROACH AND IF ONE TRIES AND FAILS I WILL POINT AND LAUGH AND NOT HELP >> THEM :D >> > > -- Ben Bromhead CTO | Instaclustr +1 650 284 9692
Re: Any tips on how to track down why Cassandra won't cluster?
Check network connectivity. If you are using public addresses as the broadcast, make sure you can telnet from one node to the other nodes public address using the internode port. Last time I looked into something like this, for some reason if you only add a security group id to the allowed traffic in a security group you still need to add public IP addresses for each node in a security groups allowed inbound traffic as well. On Wed, 3 Feb 2016 at 11:49 Richard L. Burton III <mrbur...@gmail.com> wrote: > I'm deploying 2 nodes at the moment using cassandra-dse on Amazon. I > configured it to use EC2Snitch and configured rackdc to use us-east with > rack "1". > > The second node points to the first node as the seed e.g., "seeds": > ["54.*.*.*"] and all of the ports are open. > > Any suggestions on how to track down what might trigger this problem? I'm > not receiving any exceptions. > > > -- > -Richard L. Burton III > @rburton > -- Ben Bromhead CTO | Instaclustr +1 650 284 9692
Re: EC2 storage options for C*
gt;>>>>>> Thank you all for the suggestions. I'm torn between GP2 vs >>>>>>> Ephemeral. GP2 after testing is a viable contender for our workload. The >>>>>>> only worry I have is EBS outages, which have happened. >>>>>>> >>>>>>> On Sunday, January 31, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Also in that video - it's long but worth watching >>>>>>>> >>>>>>>> We tested up to 1M reads/second as well, blowing out page cache to >>>>>>>> ensure we weren't "just" reading from memory >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jeff Jirsa >>>>>>>> >>>>>>>> >>>>>>>> On Jan 31, 2016, at 9:52 AM, Jack Krupansky < >>>>>>>> jack.krupan...@gmail.com> wrote: >>>>>>>> >>>>>>>> How about reads? Any differences between read-intensive and >>>>>>>> write-intensive workloads? >>>>>>>> >>>>>>>> -- Jack Krupansky >>>>>>>> >>>>>>>> On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa < >>>>>>>> jeff.ji...@crowdstrike.com> wrote: >>>>>>>> >>>>>>>>> Hi John, >>>>>>>>> >>>>>>>>> We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M >>>>>>>>> writes per second on 60 nodes, we didn’t come close to hitting even >>>>>>>>> 50% >>>>>>>>> utilization (10k is more than enough for most workloads). PIOPS is not >>>>>>>>> necessary. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> From: John Wong >>>>>>>>> Reply-To: "user@cassandra.apache.org" >>>>>>>>> Date: Saturday, January 30, 2016 at 3:07 PM >>>>>>>>> To: "user@cassandra.apache.org" >>>>>>>>> Subject: Re: EC2 storage options for C* >>>>>>>>> >>>>>>>>> For production I'd stick with ephemeral disks (aka instance >>>>>>>>> storage) if you have running a lot of transaction. >>>>>>>>> However, for regular small testing/qa cluster, or something you >>>>>>>>> know you want to reload often, EBS is definitely good enough and we >>>>>>>>> haven't >>>>>>>>> had issues 99%. The 1% is kind of anomaly where we have flush blocked. >>>>>>>>> >>>>>>>>> But Jeff, kudo that you are able to use EBS. I didn't go through >>>>>>>>> the video, do you actually use PIOPS or just standard GP2 in your >>>>>>>>> production cluster? >>>>>>>>> >>>>>>>>> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng < >>>>>>>>> br...@blockcypher.com> wrote: >>>>>>>>> >>>>>>>>>> Yep, that motivated my question "Do you have any idea what kind >>>>>>>>>> of disk performance you need?". If you need the performance, its >>>>>>>>>> hard to >>>>>>>>>> beat ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested >>>>>>>>>> configuration. If you don't, though, EBS GP2 will save a _lot_ of >>>>>>>>>> headache. >>>>>>>>>> >>>>>>>>>> Personally, on small clusters like ours (12 nodes), we've found >>>>>>>>>> our choice of instance dictated much more by the balance of price, >>>>>>>>>> CPU, and >>>>>>>>>> memory. We're using GP2 SSD and we find that for our patterns the >>>>>>>>>> disk is >>>>>>>>>> rarely the bottleneck. YMMV, of course. >>>>>>>>>> >>>>>>>>>> On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa < >>>>>>>>>> jeff.ji...@crowdstrike.com> wrote: >>>>>>>>>> >>>>>>>>>>> If you have to ask that question, I strongly recommend m4 or c4 >>>>>>>>>>> instances with GP2 EBS. When you don’t care about replacing a node >>>>>>>>>>> because >>>>>>>>>>> of an instance failure, go with i2+ephemerals. Until then, GP2 EBS >>>>>>>>>>> is >>>>>>>>>>> capable of amazing things, and greatly simplifies life. >>>>>>>>>>> >>>>>>>>>>> We gave a talk on this topic at both Cassandra Summit and AWS >>>>>>>>>>> re:Invent: https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s >>>>>>>>>>> very much a viable option, despite any old documents online that say >>>>>>>>>>> otherwise. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: Eric Plowe >>>>>>>>>>> Reply-To: "user@cassandra.apache.org" >>>>>>>>>>> Date: Friday, January 29, 2016 at 4:33 PM >>>>>>>>>>> To: "user@cassandra.apache.org" >>>>>>>>>>> Subject: EC2 storage options for C* >>>>>>>>>>> >>>>>>>>>>> My company is planning on rolling out a C* cluster in EC2. We >>>>>>>>>>> are thinking about going with ephemeral SSDs. The question is this: >>>>>>>>>>> Should >>>>>>>>>>> we put two in RAID 0 or just go with one? We currently run a >>>>>>>>>>> cluster in our >>>>>>>>>>> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are >>>>>>>>>>> happy with >>>>>>>>>>> the performance we are seeing thus far. >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> >>>>>>>>>>> Eric >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> >>>> >>> >>> >>> -- >>> Steve Robenalt >>> Software Architect >>> sroben...@highwire.org <bza...@highwire.org> >>> (office/cell): 916-505-1785 >>> >>> HighWire Press, Inc. >>> 425 Broadway St, Redwood City, CA 94063 >>> www.highwire.org >>> >>> Technology for Scholarly Communication >>> >> >> > -- Ben Bromhead CTO | Instaclustr +1 650 284 9692
Re: „Using Timestamp“ Feature
When using client supplied timestamps you need to ensure the clock on the client is in sync with the nodes in the cluster otherwise behaviour will be unpredictable. On Thu, 18 Feb 2016 at 08:50 Tyler Hobbs <ty...@datastax.com> wrote: > 2016-02-18 2:00 GMT-06:00 Matthias Niehoff < > matthias.nieh...@codecentric.de>: > >> >> * is the 'using timestamp' feature (and providing statement timestamps) >> sufficiently robust and mature to build an application on? >> > > Yes. It's been there since the start of CQL3. > > >> * In a BatchedStatement, can different statements have different >> (explicitly provided) timestamps, or is the BatchedStatement's timestamp >> used for them all? Is this specified / stable behaviour? >> > > Yes, you can separate timestamps per statement. And, in fact, if you > potentially mix inserts and deletes on the same rows, you *should *use > explicit timestamps with different values. See the timestamp notes here: > http://cassandra.apache.org/doc/cql3/CQL.html#batchStmt > > >> * cqhsh reports a syntax error when I use 'using timestamp' with an >> update statement (works with 'insert'). Is there a good reason for this, or >> is it a bug? >> > > The "USING TIMESTAMP" goes in a different place in update statements. It > should be something like: > > UPDATE mytable USING TIMESTAMP ? SET col = ? WHERE key = ? > > > -- > Tyler Hobbs > DataStax <http://datastax.com/> > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Sudden disk usage
+1 to checking for snapshots. Cassandra by default will automatically snapshot tables before destructive actions like drop or truncate. Some general advice regarding cleanup. Cleanup will result in a temporary increase in both disk I/O load and disk space usage (especially with STCS). It should only be used as part of a planned increase in capacity when you still have plenty of disk space left on your existing nodes. If you are running Cassandra in the cloud (AWS, Azure etc) you can add an EBS volume, copy your sstables to it then bind mount it to the troubled CF directory. This will give you some emergency disk space to let compaction and cleanup do its thing safely. On Tue, 16 Feb 2016 at 10:57 Robert Coli <rc...@eventbrite.com> wrote: > On Sat, Feb 13, 2016 at 4:30 PM, Branton Davis <branton.da...@spanning.com > > wrote: > >> We use SizeTieredCompaction. The nodes were about 67% full and we were >> planning on adding new nodes (doubling the cluster to 6) soon. >> > > Be sure to add those new nodes one at a time. > > Have you checked for, and cleared, old snapshots? Snapshots are > automatically taken at various times and have the unusual property of > growing larger over time. This is because they are hard links of data files > and do not take up disk space of their own until the files they link to are > compacted into new files. > > =Rob > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Cassandra nodes reduce disks per node
you can do this in a "rolling" fashion (one node at a time). On Wed, 17 Feb 2016 at 14:03 Branton Davis <branton.da...@spanning.com> wrote: > We're about to do the same thing. It shouldn't be necessary to shut down > the entire cluster, right? > > On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli <rc...@eventbrite.com> > wrote: > >> >> >> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal <anis...@gmail.com> >> wrote: >>> >>> To accomplish this can I just copy the data from disk1 to disk2 with in >>> the relevant cassandra home location folders, change the cassanda.yaml >>> configuration and restart the node. before starting i will shutdown the >>> cluster. >>> >> >> Yes. >> >> =Rob >> >> > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: How can I make Cassandra stable in a 2GB RAM node environment ?
+1 for http://opensourceconnections.com/blog/2013/08/31/building- the-perfect-cassandra-test-environment/ <http://opensourceconnections.com/blog/2013/08/31/building-the-perfect-cassandra-test-environment/> We also run Cassandra on t2.mediums for our Developer clusters. You can force Cassandra to do most "memory" things by hitting the disk instead (on disk compaction passes, flush immediately to disk) and by throttling client connections. In fact on the t2 series memory is not the biggest concern, but rather the CPU credit issue. On Mon, 7 Mar 2016 at 11:53 Robert Coli <rc...@eventbrite.com> wrote: > On Fri, Mar 4, 2016 at 8:27 PM, Jack Krupansky <jack.krupan...@gmail.com> > wrote: > >> Please review the minimum hardware requirements as clearly documented: >> >> http://docs.datastax.com/en/cassandra/3.x/cassandra/planning/planPlanningHardware.html >> > > That is a document for Datastax Cassandra, not Apache Cassandra. It's > wonderful that Datastax provides docs, but Datastax Cassandra is a superset > of Apache Cassandra. Presuming that the requirements of one are exactly > equivalent to the requirements of the other is not necessarily reasonable. > > Please adjust your hardware usage to at least meet the clearly documented >> minimum requirements. If you continue to encounter problems once you have >> corrected your configuration error, please resubmit the details with >> updated hardware configuration details. >> > > Disagree. OP specifically stated that they knew this was not a recommended > practice. It does not seem unlikely that they are constrained to use this > hardware for reasons outside of their control. > > >> Just to be clear, development on less than 4 GB is not supported and >> production on less than 8 GB is not supported. Those are not suggestions or >> guidelines or recommendations, they are absolute requirements. >> > > What does "supported" mean here? That Datastax will not provide support if > you do not follow the above recommendations? Because it certainly is > "supported" in the sense of "it can be made to work" ... ? > > The premise of a minimum RAM level seems meaningless without context. How > much data are you serving from your 2GB RAM node? What is the rate of > client requests? > > To be clear, I don't recommend trying to run production Cassandra with > under 8GB of RAM on your node, but "absolute requirement" is a serious > overstatement. > > > http://opensourceconnections.com/blog/2013/08/31/building-the-perfect-cassandra-test-environment/ > > Has some good discussion of how to run Cassandra in a low memory > environment. Maybe someone should tell John that his 64MB of JVM heap for a > test node is 62x too small to be "supported"? :D > > =Rob > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Optional TLS CQL Encryption
Hi Jason If you enable encryption it will be always on. Optional encryption is generally a bad idea (tm). Also always creating a new session every query is also a bad idea (tm) even without the minimal overhead of encryption. If you are really hell bent on doing this you could have a node that is part of the cluster but has -Dcassandra.join_ring=false set in jvm options in cassandra-env.sh so it does not get any data and configure that to have no encryption enabled. This is known as a fat client. Then connect to that specific node whenever you want to do terrible non encrypted things. Having said all that, please don't do this. Cheers On Tue, 19 Apr 2016 at 15:32 Jason J. W. Williams <jasonjwwilli...@gmail.com> wrote: > Hey Guys, > > Is there a way to make TLS encryption optional for the CQL listener? We'd > like to be able to use for remote management connections but not for same > datacenter usage (since the build/up tear down cost is too high for things > that don't use pools). > > Right now it appears if we enable encryption it requires it for all > connections, which definitely is not what we want. > > -J > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Changing Racks of Nodes
If the rack as defined in Cassandra stays the same (e.g. cassandra-rackdc.properties), things will keep working as expected... except when the actual rack (or fault domain) goes down and you are likely to lose more nodes than expected. If you change the rack as defined in Cassandra, the node will start handling queries it does not have data for. The best way to change the move racks is to decommission the node, then bootstrap it with the new rack settings. On Wed, 20 Apr 2016 at 15:49 Anubhav Kale <anubhav.k...@microsoft.com> wrote: > Hello, > > > > If a running node moves around and changes its rack in the process, when > its back in the cluster (through ignore-rack property), is it a correct > statement that queries will not see some data residing on this node until a > repair is run ? > > > > Or, is it more like the node may get requests for the data it does not own > (meaning data will never “disappear”) ? > > > > I’d appreciate some details on this topic from experts ! > > > > Thanks ! > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: SS Tables Files Streaming
Yup, with repair and particularly bootstrap is there is a decent amount of "over streaming" of data due to the fact it's just sending an sstable. On Fri, 6 May 2016 at 14:49 Anubhav Kale <anubhav.k...@microsoft.com> wrote: > Does repair really send SS Table files as is ? Wouldn’t data for tokens be > distributed across SS Tables ? > > > > *From:* Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com] > *Sent:* Friday, May 6, 2016 2:12 PM > > > *To:* user@cassandra.apache.org > *Subject:* Re: SS Tables Files Streaming > > > > Also probably sstableloader / bulk loading interface > > > > > > > > > > (I don’t think any of these necessarily stream “as-is”, but that’s a > different conversation I suspect) > > > > > > *From: *Jonathan Haddad > *Reply-To: *"user@cassandra.apache.org" > *Date: *Friday, May 6, 2016 at 1:52 PM > *To: *"user@cassandra.apache.org" > *Subject: *Re: SS Tables Files Streaming > > > > Repairs, bootstamp, decommission. > > > > On Fri, May 6, 2016 at 1:16 PM Anubhav Kale <anubhav.k...@microsoft.com> > wrote: > > Hello, > > > > In what scenarios can SS Table files on disk from Node 1 go to Node 2 as > is ? I’m aware this happens in *nodetool rebuild* and I am assuming this > does *not* happen in repairs. Can someone confirm ? > > > > The reason I ask is I am working on a solution for backup / restore and I > need to be sure if I boot a node, start copying over backed up files then > those files won’t get overwritten by something coming from other nodes. > > > > Thanks ! > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: SS Tables Files Streaming
Note that incremental repair strategies (2.1+) run anti-compaction against sstables in the range being repaired, so this will prevent overstreaming based on the ranges in the repair session. On Mon, 9 May 2016 at 10:31 Ben Bromhead <b...@instaclustr.com> wrote: > Yup, with repair and particularly bootstrap is there is a decent amount of > "over streaming" of data due to the fact it's just sending an sstable. > > On Fri, 6 May 2016 at 14:49 Anubhav Kale <anubhav.k...@microsoft.com> > wrote: > >> Does repair really send SS Table files as is ? Wouldn’t data for tokens >> be distributed across SS Tables ? >> >> >> >> *From:* Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com] >> *Sent:* Friday, May 6, 2016 2:12 PM >> >> >> *To:* user@cassandra.apache.org >> *Subject:* Re: SS Tables Files Streaming >> >> >> >> Also probably sstableloader / bulk loading interface >> >> >> >> >> >> >> >> >> >> (I don’t think any of these necessarily stream “as-is”, but that’s a >> different conversation I suspect) >> >> >> >> >> >> *From: *Jonathan Haddad >> *Reply-To: *"user@cassandra.apache.org" >> *Date: *Friday, May 6, 2016 at 1:52 PM >> *To: *"user@cassandra.apache.org" >> *Subject: *Re: SS Tables Files Streaming >> >> >> >> Repairs, bootstamp, decommission. >> >> >> >> On Fri, May 6, 2016 at 1:16 PM Anubhav Kale <anubhav.k...@microsoft.com> >> wrote: >> >> Hello, >> >> >> >> In what scenarios can SS Table files on disk from Node 1 go to Node 2 as >> is ? I’m aware this happens in *nodetool rebuild* and I am assuming >> this does *not* happen in repairs. Can someone confirm ? >> >> >> >> The reason I ask is I am working on a solution for backup / restore and I >> need to be sure if I boot a node, start copying over backed up files then >> those files won’t get overwritten by something coming from other nodes. >> >> >> >> Thanks ! >> >> -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 > Managed Cassandra / Spark on AWS, Azure and Softlayer > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Authentication with Java driver
On Tue, 7 Feb 2017 at 17:52 Yuji Ito <y...@imagine-orb.com> wrote: Thanks Andrew, Ben, My application creates a lot of instances connecting to Cassandra with basically the same set of credentials. Do you mean lots of instances of the process or lots of instances of the cluster/session object? After an instance connects to Cassandra with the credentials, can any instance connect to Cassandra without credentials? As long as you don't share the session or cluster objects. Each new cluster/session will need to reauthenticate. == example == A first = new A("database", "user", "password"); // proper credentials r = first.get(); ... A other = new A("database", "user", "pass"); // wrong password r = other.get(); == example == I want to refuse the `other` instance with improper credentials. This looks like you are creating new cluster/session objects (filling in the blanks for your pseudocode here). So "other" will not authenticate to Cassandra. This brings up a wider point of why you are doing this? Generally most applications will create a single longed lived session object that lasts the life of the application process. I would not rely on Cassandra auth to authenticate downstream actors, not because it's bad, just its generally inefficient to create lots of session objects. The session object maintains a connection pool, pipelines requests, is thread safe and generally pretty solid. Yuji On Wed, Feb 8, 2017 at 4:11 AM, Ben Bromhead <b...@instaclustr.com> wrote: What are you specifically trying to achieve? Are you trying to authenticate multiple Cassandra users from a single application instance? Or will your have lot's of application instances connecting to Cassandra using the same set of credentials? Or a combination of both? Multiple application instances with different credentials? On Tue, 7 Feb 2017 at 06:19 Andrew Tolbert <andrew.tolb...@datastax.com> wrote: Hello, The API seems kind of not correct because credentials should be usually set with a session but actually they are set with a cluster. With the datastax driver, Session is what manages connection pools to each node. Cluster manages configuration and a separate connection ('control connection') to subscribe to state changes (schema changes, node topology changes, node up/down events). So, if there are 1000 clients, then with this API it has to create 1000 cluster instances ? I'm unsure how common it is for per-user authentication to be done when connecting to the database. I think an application would normally authenticate with one set of credentials instead of multiple. The protocol Cassandra uses does authentication at the connection level instead of at the request level, so that is currently a limitation to support something like reusing Sessions for authenticating multiple users. Thanks, Andy On Tue, Feb 7, 2017 at 7:19 AM Hiroyuki Yamada <mogwa...@gmail.com> wrote: Hi, The API seems kind of not correct because credentials should be usually set with a session but actually they are set with a cluster. So, if there are 1000 clients, then with this API it has to create 1000 cluster instances ? 1000 clients seems usual if there are many nodes (say 20) and each node has some concurrency (say 50), but 1000 cluster instances seems too many. Is this an expected way to do this ? or Is there any way to authenticate per session ? Thanks, Hiro On Tue, Feb 7, 2017 at 11:38 AM, Yuji Ito <y...@imagine-orb.com> wrote: > Hi all, > > I want to know how to authenticate Cassandra users for multiple instances > with Java driver. > For instance, each thread creates a instance to access Cassandra with > authentication. > > As the implementation example, only the first constructor builds a cluster > and a session. > Other constructors use them. > This example is implemented according to the datastax document: "Basically > you will want to share the same cluster and session instances across your > application". > http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra > > However, other constructors don't authenticate the user and the password. > That's because they don't need to build a cluster and a session. > > So, should I create a cluster and a session per instance for the > authentication? > If yes, can I create a lot of instances(clusters and sessions) to access C* > concurrently? > > == example == > public class A { > private static Cluster cluster = null; > private static Map<String, Session> sessions = null; > private Session session; > > public A (String keyspace, String user, String password) { > if (cluster == null) { > builder = Cluster.builder(); > ... > builder = builder.withCredentials(use
Instaclustr Masters scholarship
As part of our commitment to contributing back to the Apache Cassandra open source project and the wider community we are always looking for ways we can foster knowledge sharing and improve usability of Cassandra itself. One of the ways we have done so previously was to open up our internal builds and versions of Cassandra (https://github.com/instaclustr/cassandra). We have also been looking at a few novel or outside the box ways we can further contribute back to the community. As such, we are sponsoring a masters project in conjunction with the Australian based University of Canberra. Instaclustr’s staff will be available to provide advice and feedback to the successful candidate. *Scope* Distributed database systems are relatively new technology compared to traditional relational databases. Distributed advantages provide significant advantages in terms of reliability and scalability but often at a cost of increased complexity. This complexity presents challenges for testing of these systems to prove correct operation across all possible system states. The scope of this masters scholarship is to use the Apache Cassandra repair process as an example to consider and improve available approaches to distributed database systems testing. The repair process in Cassandra is a scheduled process that runs to ensure the multiple copies of each piece of data that is maintained by Cassandra are kept synchronised. Correct operation of repairs has been an ongoing challenge for the Cassandra project partly due to the difficulty in designing and developing comprehensive automated tests for this functionality. The expected scope of this project is to: - survey and understand the existing testing framework available as part of the Cassandra project, particularly as it pertains to testing repairs - consider, research and develop enhanced approaches to testing of repairs - submit any successful approaches to the Apache Cassandra project for feedback and inclusion in the project code base Australia is a pretty great place to advance your education and is welcoming of foreign students. We are also open to sponsoring a PhD project with a more in depth focus for the right candidate. For more details please don't hesitate to get in touch with myself or reach out to i...@instaclustr.com. Cheers Ben -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Authentication with Java driver
What are you specifically trying to achieve? Are you trying to authenticate multiple Cassandra users from a single application instance? Or will your have lot's of application instances connecting to Cassandra using the same set of credentials? Or a combination of both? Multiple application instances with different credentials? On Tue, 7 Feb 2017 at 06:19 Andrew Tolbert <andrew.tolb...@datastax.com> wrote: > Hello, > > The API seems kind of not correct because credentials should be > usually set with a session but actually they are set with a cluster. > > > With the datastax driver, Session is what manages connection pools to > each node. Cluster manages configuration and a separate connection > ('control connection') to subscribe to state changes (schema changes, node > topology changes, node up/down events). > > > So, if there are 1000 clients, then with this API it has to create > 1000 cluster instances ? > > > I'm unsure how common it is for per-user authentication to be done when > connecting to the database. I think an application would normally > authenticate with one set of credentials instead of multiple. The protocol > Cassandra uses does authentication at the connection level instead of at > the request level, so that is currently a limitation to support something > like reusing Sessions for authenticating multiple users. > > Thanks, > Andy > > > On Tue, Feb 7, 2017 at 7:19 AM Hiroyuki Yamada <mogwa...@gmail.com> wrote: > > Hi, > > The API seems kind of not correct because credentials should be > usually set with a session but actually they are set with a cluster. > > So, if there are 1000 clients, then with this API it has to create > 1000 cluster instances ? > 1000 clients seems usual if there are many nodes (say 20) and each > node has some concurrency (say 50), > but 1000 cluster instances seems too many. > > Is this an expected way to do this ? or > Is there any way to authenticate per session ? > > Thanks, > Hiro > > On Tue, Feb 7, 2017 at 11:38 AM, Yuji Ito <y...@imagine-orb.com> wrote: > > Hi all, > > > > I want to know how to authenticate Cassandra users for multiple instances > > with Java driver. > > For instance, each thread creates a instance to access Cassandra with > > authentication. > > > > As the implementation example, only the first constructor builds a > cluster > > and a session. > > Other constructors use them. > > This example is implemented according to the datastax document: > "Basically > > you will want to share the same cluster and session instances across your > > application". > > > http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra > > > > However, other constructors don't authenticate the user and the password. > > That's because they don't need to build a cluster and a session. > > > > So, should I create a cluster and a session per instance for the > > authentication? > > If yes, can I create a lot of instances(clusters and sessions) to access > C* > > concurrently? > > > > == example == > > public class A { > > private static Cluster cluster = null; > > private static Map<String, Session> sessions = null; > > private Session session; > > > > public A (String keyspace, String user, String password) { > > if (cluster == null) { > > builder = Cluster.builder(); > > ... > > builder = builder.withCredentials(user, password); > > cluster = builder.build(); > > } > > session = sessions.get(keyspace); > > if (session == null) { > > session = cluster.connection(keyspace); > > sessions.put(keyspace, session) > > } > > ... > > } > > ... > > public ResultSet update(...) { > > ... > > public ResultSet get(...) { > > ... > > } > > == example == > > > > Thanks, > > Yuji > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: compaction falling behind
You can do so in two ways: 1) direct observation: You can keep an eye on the number of pending compactions. This will fluctuate with load, compaction strategy, ongoing repairs and nodes bootstrapping but generally the pattern is it should trend towards 0. There have been a number of bugs in past versions of Cassandra whereby the number of pending compactions is not reported correctly, so depending on what version of Cassandra you run this could impact you. 2) Indirect observation You can keep an eye on metrics that healthy compaction will directly contribute to. These include the number of sstables per read histogram, estimated droppable tombstones, tombstones per read etc. You should keep an eye on these things anyway as they can often show you areas where you can fine tune compaction or your data model. Everything exposed by nodetool is consumable via JMX which is great to plug into your metrics/monitoring/observability system :) On Mon, 13 Feb 2017 at 13:23 John Sanda <john.sa...@gmail.com> wrote: > What is a good way to determine whether or not compaction is falling > behind? I read a couple things earlier that suggest nodetool > compactionstats might not be the most reliable thing to use. > > > > - John > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Authentication with Java driver
If the processes are launched separately or you fork before setting up the cluster object it won't share credentials. On Wed, Feb 8, 2017, 02:33 Yuji Ito <y...@imagine-orb.com> wrote: > Thanks Ben, > > Do you mean lots of instances of the process or lots of instances of the > cluster/session object? > > > Lots of instances of the process are generated. > I wanted to confirm that `other` doesn't authenticate. > > If I want to avoid that, my application has to create new cluster/session > objects per instance. > But it is inefficient and uncommon. > So, we aren't sure that the application works when a lot of > cluster/session objects are created. > Is it correct? > > Thank you, > Yuji > > > > On Wed, Feb 8, 2017 at 12:01 PM, Ben Bromhead <b...@instaclustr.com> wrote: > > On Tue, 7 Feb 2017 at 17:52 Yuji Ito <y...@imagine-orb.com> wrote: > > Thanks Andrew, Ben, > > My application creates a lot of instances connecting to Cassandra with > basically the same set of credentials. > > Do you mean lots of instances of the process or lots of instances of the > cluster/session object? > > > After an instance connects to Cassandra with the credentials, can any > instance connect to Cassandra without credentials? > > As long as you don't share the session or cluster objects. Each new > cluster/session will need to reauthenticate. > > > == example == > A first = new A("database", "user", "password"); // proper credentials > r = first.get(); > ... > A other = new A("database", "user", "pass"); // wrong password > r = other.get(); > == example == > > I want to refuse the `other` instance with improper credentials. > > > This looks like you are creating new cluster/session objects (filling in > the blanks for your pseudocode here). So "other" will not authenticate to > Cassandra. > > This brings up a wider point of why you are doing this? Generally most > applications will create a single longed lived session object that lasts > the life of the application process. > > I would not rely on Cassandra auth to authenticate downstream actors, not > because it's bad, just its generally inefficient to create lots of session > objects. The session object maintains a connection pool, pipelines > requests, is thread safe and generally pretty solid. > > > > > Yuji > > > On Wed, Feb 8, 2017 at 4:11 AM, Ben Bromhead <b...@instaclustr.com> wrote: > > What are you specifically trying to achieve? Are you trying to > authenticate multiple Cassandra users from a single application instance? > Or will your have lot's of application instances connecting to Cassandra > using the same set of credentials? Or a combination of both? Multiple > application instances with different credentials? > > On Tue, 7 Feb 2017 at 06:19 Andrew Tolbert <andrew.tolb...@datastax.com> > wrote: > > Hello, > > The API seems kind of not correct because credentials should be > usually set with a session but actually they are set with a cluster. > > > With the datastax driver, Session is what manages connection pools to > each node. Cluster manages configuration and a separate connection > ('control connection') to subscribe to state changes (schema changes, node > topology changes, node up/down events). > > > So, if there are 1000 clients, then with this API it has to create > 1000 cluster instances ? > > > I'm unsure how common it is for per-user authentication to be done when > connecting to the database. I think an application would normally > authenticate with one set of credentials instead of multiple. The protocol > Cassandra uses does authentication at the connection level instead of at > the request level, so that is currently a limitation to support something > like reusing Sessions for authenticating multiple users. > > Thanks, > Andy > > > On Tue, Feb 7, 2017 at 7:19 AM Hiroyuki Yamada <mogwa...@gmail.com> wrote: > > Hi, > > The API seems kind of not correct because credentials should be > usually set with a session but actually they are set with a cluster. > > So, if there are 1000 clients, then with this API it has to create > 1000 cluster instances ? > 1000 clients seems usual if there are many nodes (say 20) and each > node has some concurrency (say 50), > but 1000 cluster instances seems too many. > > Is this an expected way to do this ? or > Is there any way to authenticate per session ? > > Thanks, > Hiro > > On Tue, Feb 7, 2017 at 11:38 AM, Yuji Ito <y...@imagine-orb.com> wrote: > > Hi all, > > > > I want to know how to authenticate
Re: Cassandra Authentication
We have a process that syncs and manages RF==N and we also control and manage users, however that entails it's own set of challenges and maintenance. For most users I would suggest 3 < RF <=5 is sufficient. Also make sure you don't use the user "Cassandra" in production as authentication queries are done at QUORUM. On Wed, 18 Jan 2017 at 13:41 Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Hello, > > When enabling Authentication on cassandra, is it required to set the RF > same as the no.of nodes( > https://docs.datastax.com/en/cql/3.1/cql/cql_using/update_ks_rf_t.html)? > or can I live with RF of 3 in each DC (other KS are using 3) > > If it has to be equal to the number of nodes then, every time adding or > removing a node requires update of RF. > > Thanks in advance. > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Cassandra Authentication
the volume of data is pretty low + you still want to be able to authenticate even if you have more nodes down than the RF for other keyspaces. Essentially you don't want auth to be the thing that stops you serving requests. On Wed, 18 Jan 2017 at 14:57 Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Thanks Ben, > > RF 3 isn't sufficient for system_auth? as we are using 3 RF for other > production KS, do you see any challenges? > > On Wed, Jan 18, 2017 at 2:39 PM, Ben Bromhead <b...@instaclustr.com> wrote: > > We have a process that syncs and manages RF==N and we also control and > manage users, however that entails it's own set of challenges and > maintenance. > > For most users I would suggest 3 < RF <=5 is sufficient. Also make sure > you don't use the user "Cassandra" in production as authentication queries > are done at QUORUM. > > On Wed, 18 Jan 2017 at 13:41 Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > > Hello, > > When enabling Authentication on cassandra, is it required to set the RF > same as the no.of nodes( > https://docs.datastax.com/en/cql/3.1/cql/cql_using/update_ks_rf_t.html)? > or can I live with RF of 3 in each DC (other KS are using 3) > > If it has to be equal to the number of nodes then, every time adding or > removing a node requires update of RF. > > Thanks in advance. > > -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 <+1%20650-284-9692> > Managed Cassandra / Spark on AWS, Azure and Softlayer > > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: How Fast Does Information Spread With Gossip?
Gossip propagation is generally best modelled by epidemic algorithms. Luckily for us Cassandra's gossip protocol is fairly simply. Cassandra will perform one Gossip Task every second. Within each gossip task it will randomly gossip with another available node in the cluster, it will also possibly attempt to gossip with a down node (based on a random chance that increases as the number of down nodes increases) and if it hasn't gossiped with seed that round it may also attempt to gossip with a defined seed. So Cassandra can do up to 3 rounds per second, however these extra rounds are supposed to be optimizations for improving average case convergence and recovering from split brain scenarios quicker than would normally occur. Assuming just one gossip round per second, for a new piece of information to spread to all members of the cluster via gossip, you would see a worst case performance of O(n) gossip rounds where n is the number of nodes in the cluster. This is because each Cassandra node can gossip to any other node irrespective of topology (a fully connected mesh). There is some ongoing discussion about expanding gossip to utilise partial views of the cluster and exchanging those, or using spanning/broadcast trees to speed up convergence and reduce workload in large clusters (1000+) nodes, see https://issues.apache.org/jira/browse/CASSANDRA-12345 for details. On Fri, 16 Sep 2016 at 01:01 Jens Rantil <jens.ran...@tink.se> wrote: > > Is a minute a reasonable upper bound for most clusters? > > I have no numbers and I'm sure this differs depending on how large your > cluster is. We have a small cluster of around 12 nodes and I statuses > generally propagate in under 5 seconds for sure. So, it will definitely be > less than 1 minute. > > Cheers, > Jens > > On Wed, Sep 14, 2016 at 8:49 PM jerome <jeromefroel...@hotmail.com> wrote: > >> Hi, >> >> >> I was curious if anyone had any kind of statistics or ballpark figures on >> how long it takes information to propagate through a cluster with Gossip? >> I'm particularly interested in how fast information about the liveness of a >> node spreads. For example, in an n-node cluster the median amount of time >> it takes for all nodes to learn that a node went down is f(n) seconds. Is a >> minute a reasonable upper bound for most clusters? Too high, too low? >> >> >> Thanks, >> >> Jerome >> > -- > > Jens Rantil > Backend Developer @ Tink > > Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden > For urgent matters you can reach me at +46-708-84 18 32. > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Handle Leap Seconds with Cassandra
If you need guaranteed strict ordering in a distributed system, I would not use Cassandra, Cassandra does not provide this out of the box. I would look to a system that uses lamport or vector clocks. Based on your description of how your systems runs at the moment (and how close your updates are together), you have either already experienced out of order updates or there is a real possibility you will in the future. Sorry to be so dire, but if you do require causal consistency / strict ordering, you are not getting it at the moment. Distributed systems theory is really tricky, even for people that are "experts" on distributed systems over unreliable networks (I would certainly not put myself in that category). People have made a very good name for themselves by showing that the vast majority of distributed databases have had bugs when it comes to their various consistency models and the claims these databases make. So make sure you really do need guaranteed causal consistency/strict ordering or if you can design around it (e.g. using conflict free replicated data types) or choose a system that is designed to provide it. Having said that... here are some hacky things you could do in Cassandra to try and get this behaviour, which I in no way endorse doing :) - Cassandra counters do leverage a logical clock per shard and you could hack something together with counters and lightweight transactions, but you would want to do your homework on counters accuracy during before diving into it... as I don't know if the implementation is safe in the context of your question. Also this would probably require a significant rework of your application plus a significant performance hit. I would invite a counter guru to jump in here... - You can leverage the fact that timestamps are monotonic if you isolate writes to a single node for a single shared... but you then loose Cassandra's availability guarantees, e.g. a keyspace with an RF of 1 and a CL of > ONE will get monotonic timestamps (if generated on the server side). - Continuing down the path of isolating writes to a single node for a given shard you could also isolate writes to the primary replica using your client driver during the leap second (make it a minute either side of the leap), but again you lose out on availability and you are probably already experiencing out of ordered writes given how close your writes and updates are. A note on NTP: NTP is generally fine if you use it to keep the clocks synced between the Cassandra nodes. If you are interested in how we have implemented NTP at Instaclustr, see our blogpost on it https://www.instaclustr.com/blog/2015/11/05/apache-cassandra-synchronization/ . Ben On Thu, 27 Oct 2016 at 10:18 Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > Hi Ben, > > Thanks for your reply. We dont use timestamps in primary key. We rely on > server side timestamps generated by coordinator. So, no functions at > client side would help. > > Yes, drifts can create problems too. But even if you ensure that nodes are > perfectly synced with NTP, you will surely mess up the order of updates > during the leap second(interleaving). Some applications update same column > of same row quickly (within a second ) and reversing the order would > corrupt the data. > > I am interested in learning how people relying on strict order of updates > handle leap second scenario when clock goes back one second(same second is > repeated). What kind of tricks people use to ensure that server side > timestamps are monotonic ? > > As per my understanding NTP slew mode may not be suitable for Cassandra as > it may cause unpredictable drift amongst the Cassandra nodes. Ideas ?? > > > Thanks > Anuj > > > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > > On Thu, 20 Oct, 2016 at 11:25 PM, Ben Bromhead > > <b...@instaclustr.com> wrote: > http://www.datastax.com/dev/blog/preparing-for-the-leap-second gives a > pretty good overview > > If you are using a timestamp as part of your primary key, this is the > situation where you could end up overwriting data. I would suggest using > timeuuid instead which will ensure that you get different primary keys even > for data inserted at the exact same timestamp. > > The blog post also suggests using certain monotonic timestamp classes in > Java however these will not help you if you have multiple clients that may > overwrite data. > > As for the interleaving or out of order problem, this is hard to address > in Cassandra without resorting to external coordination or LWTs. If you are > relying on a wall clock to guarantee order in a distributed system you will > get yourself into trouble even without leap seconds (clock drift, NTP > inaccur
Re: Are Cassandra writes are faster than reads?
Awesome! For a full explanation of what you are seeing (we call it micro batching) check out Adam Zegelins talk on it https://www.youtube.com/watch?v=wF3Ec1rdWgc On Tue, 8 Nov 2016 at 02:21 Rajesh Radhakrishnan < rajesh.radhakrish...@phe.gov.uk> wrote: > > Hi, > > Just found that reducing the batch size below 20 also increases the > writing speed and reduction in memory usage(especially for Python driver). > > Kind regards, > Rajesh R > > ------ > *From:* Ben Bromhead [b...@instaclustr.com] > *Sent:* 07 November 2016 05:44 > *To:* user@cassandra.apache.org > *Subject:* Re: Are Cassandra writes are faster than reads? > > They can be and it depends on your compaction strategy :) > > On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <ali.rac...@gmail.com > <http://redir.aspx?REF=KvuN_F91CkILmAKkPOD8RLOkpaObm4vWZ4CTx2PNAjG8Cvd6wAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>> > wrote: > > tl;dr? I just want to know if updates are bad for performance, and if so, > for how long. > > On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <b...@instaclustr.com > <http://redir.aspx?REF=bOLz-2Z_cjZ-R5mW4ySFRmRgIvYoWF43pRrpxxUsOOC8Cvd6wAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..> > > wrote: > > Check out https://wiki.apache.org/cassandra/WritePathForUsers > <http://redir.aspx?REF=z6gebtTM9Bi4b1ZEZqnpcgJOwnifCWloccEOX28F8UC8Cvd6wAfUCAFodHRwczovL3dpa2kuYXBhY2hlLm9yZy9jYXNzYW5kcmEvV3JpdGVQYXRoRm9yVXNlcnM.> > for > the full gory details. > > On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac...@gmail.com > <http://redir.aspx?REF=KvuN_F91CkILmAKkPOD8RLOkpaObm4vWZ4CTx2PNAjG8Cvd6wAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>> > wrote: > > How long does it take for updates to get merged / compacted into the main > data file? > > On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com > <http://redir.aspx?REF=bOLz-2Z_cjZ-R5mW4ySFRmRgIvYoWF43pRrpxxUsOOC8Cvd6wAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..> > > wrote: > > To add some flavor as to how the commitlog implementation is so quick. > > It only flushes to disk every 10s by default. So writes are effectively > done to memory and then to disk asynchronously later on. This is generally > accepted to be OK, as the write is also going to other nodes. > > You can of course change this behavior to flush on each write or to skip > the commitlog altogether (danger!). This however will change how "safe" > things are from a durability perspective. > > On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com > <http://redir.aspx?REF=CSJmlUdwjTSoe3NQdZNlO6pFPeaI_KxNpZweB-GbDYO8Cvd6wAfUCAFtYWlsdG86amVmZi5qaXJzYUBjcm93ZHN0cmlrZS5jb20.>> > wrote: > > Cassandra writes are particularly fast, for a few reasons: > > > > 1) Most writes go to a commitlog (append-only file, written > linearly, so particularly fast in terms of disk operations) and then pushed > to the memTable. Memtable is flushed in batches to the permanent data > files, so it buffers many mutations and then does a sequential write to > persist that data to disk. > > 2) Reads may have to merge data from many data tables on disk. > Because the writes (described very briefly in step 1) write to immutable > files, updates/deletes have to be merged on read – this is extra effort for > the read path. > > > > If you don’t do much in terms of overwrites/deletes, and your partitions > are particularly small, and your data fits in RAM (probably mmap/page cache > of data files, unless you’re using the row cache), reads may be very fast > for you. Certainly individual reads on low-merge workloads can be < 0.1ms. > > > > - Jeff > > > > *From: *Vikas Jaiman <er.vikasjai...@gmail.com > <http://redir.aspx?REF=VgqqnBUEzP6sLWofnDxFp3iyHQ4TGCTJL8MbqH0NOUK8Cvd6wAfUCAFtYWlsdG86ZXIudmlrYXNqYWltYW5AZ21haWwuY29t> > > > *Reply-To: *"user@cassandra.apache.org > <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>" > <user@cassandra.apache.org > <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..> > > > *Date: *Sunday, November 6, 2016 at 12:42 PM > *To: *"user@cassandra.apache.org > <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>" > <user@cassandra.apache.org > <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..> > > > *Subject: *Are Cassandra writes are faster than reads? &
Re: Are Cassandra writes are faster than reads?
Check out https://wiki.apache.org/cassandra/WritePathForUsers for the full gory details. On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac...@gmail.com> wrote: > How long does it take for updates to get merged / compacted into the main > data file? > > On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com> wrote: > > To add some flavor as to how the commitlog implementation is so quick. > > It only flushes to disk every 10s by default. So writes are effectively > done to memory and then to disk asynchronously later on. This is generally > accepted to be OK, as the write is also going to other nodes. > > You can of course change this behavior to flush on each write or to skip > the commitlog altogether (danger!). This however will change how "safe" > things are from a durability perspective. > > On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > > Cassandra writes are particularly fast, for a few reasons: > > > > 1) Most writes go to a commitlog (append-only file, written > linearly, so particularly fast in terms of disk operations) and then pushed > to the memTable. Memtable is flushed in batches to the permanent data > files, so it buffers many mutations and then does a sequential write to > persist that data to disk. > > 2) Reads may have to merge data from many data tables on disk. > Because the writes (described very briefly in step 1) write to immutable > files, updates/deletes have to be merged on read – this is extra effort for > the read path. > > > > If you don’t do much in terms of overwrites/deletes, and your partitions > are particularly small, and your data fits in RAM (probably mmap/page cache > of data files, unless you’re using the row cache), reads may be very fast > for you. Certainly individual reads on low-merge workloads can be < 0.1ms. > > > > - Jeff > > > > *From: *Vikas Jaiman <er.vikasjai...@gmail.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Sunday, November 6, 2016 at 12:42 PM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *Are Cassandra writes are faster than reads? > > > > Hi all, > > > > Are Cassandra writes are faster than reads ?? If yes, why is this so? I am > using consistency 1 and data is in memory. > > > > Vikas > > -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 > Managed Cassandra / Spark on AWS, Azure and Softlayer > > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Are Cassandra writes are faster than reads?
They can be and it depends on your compaction strategy :) On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <ali.rac...@gmail.com> wrote: > tl;dr? I just want to know if updates are bad for performance, and if so, > for how long. > > On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <b...@instaclustr.com> wrote: > > Check out https://wiki.apache.org/cassandra/WritePathForUsers for the > full gory details. > > On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac...@gmail.com> wrote: > > How long does it take for updates to get merged / compacted into the main > data file? > > On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com> wrote: > > To add some flavor as to how the commitlog implementation is so quick. > > It only flushes to disk every 10s by default. So writes are effectively > done to memory and then to disk asynchronously later on. This is generally > accepted to be OK, as the write is also going to other nodes. > > You can of course change this behavior to flush on each write or to skip > the commitlog altogether (danger!). This however will change how "safe" > things are from a durability perspective. > > On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > > Cassandra writes are particularly fast, for a few reasons: > > > > 1) Most writes go to a commitlog (append-only file, written > linearly, so particularly fast in terms of disk operations) and then pushed > to the memTable. Memtable is flushed in batches to the permanent data > files, so it buffers many mutations and then does a sequential write to > persist that data to disk. > > 2) Reads may have to merge data from many data tables on disk. > Because the writes (described very briefly in step 1) write to immutable > files, updates/deletes have to be merged on read – this is extra effort for > the read path. > > > > If you don’t do much in terms of overwrites/deletes, and your partitions > are particularly small, and your data fits in RAM (probably mmap/page cache > of data files, unless you’re using the row cache), reads may be very fast > for you. Certainly individual reads on low-merge workloads can be < 0.1ms. > > > > - Jeff > > > > *From: *Vikas Jaiman <er.vikasjai...@gmail.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Sunday, November 6, 2016 at 12:42 PM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *Are Cassandra writes are faster than reads? > > > > Hi all, > > > > Are Cassandra writes are faster than reads ?? If yes, why is this so? I am > using consistency 1 and data is in memory. > > > > Vikas > > -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 > Managed Cassandra / Spark on AWS, Azure and Softlayer > > > -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 > Managed Cassandra / Spark on AWS, Azure and Softlayer > > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Are Cassandra writes are faster than reads?
To add some flavor as to how the commitlog implementation is so quick. It only flushes to disk every 10s by default. So writes are effectively done to memory and then to disk asynchronously later on. This is generally accepted to be OK, as the write is also going to other nodes. You can of course change this behavior to flush on each write or to skip the commitlog altogether (danger!). This however will change how "safe" things are from a durability perspective. On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > Cassandra writes are particularly fast, for a few reasons: > > > > 1) Most writes go to a commitlog (append-only file, written > linearly, so particularly fast in terms of disk operations) and then pushed > to the memTable. Memtable is flushed in batches to the permanent data > files, so it buffers many mutations and then does a sequential write to > persist that data to disk. > > 2) Reads may have to merge data from many data tables on disk. > Because the writes (described very briefly in step 1) write to immutable > files, updates/deletes have to be merged on read – this is extra effort for > the read path. > > > > If you don’t do much in terms of overwrites/deletes, and your partitions > are particularly small, and your data fits in RAM (probably mmap/page cache > of data files, unless you’re using the row cache), reads may be very fast > for you. Certainly individual reads on low-merge workloads can be < 0.1ms. > > > > - Jeff > > > > *From: *Vikas Jaiman <er.vikasjai...@gmail.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Sunday, November 6, 2016 at 12:42 PM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *Are Cassandra writes are faster than reads? > > > > Hi all, > > > > Are Cassandra writes are faster than reads ?? If yes, why is this so? I am > using consistency 1 and data is in memory. > > > > Vikas > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Handle Leap Seconds with Cassandra
http://www.datastax.com/dev/blog/preparing-for-the-leap-second gives a pretty good overview If you are using a timestamp as part of your primary key, this is the situation where you could end up overwriting data. I would suggest using timeuuid instead which will ensure that you get different primary keys even for data inserted at the exact same timestamp. The blog post also suggests using certain monotonic timestamp classes in Java however these will not help you if you have multiple clients that may overwrite data. As for the interleaving or out of order problem, this is hard to address in Cassandra without resorting to external coordination or LWTs. If you are relying on a wall clock to guarantee order in a distributed system you will get yourself into trouble even without leap seconds (clock drift, NTP inaccuracy etc). On Thu, 20 Oct 2016 at 10:30 Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > Hi, > > I would like to know how you guys handle leap seconds with Cassandra. > > I am not bothered about the livelock issue as we are using appropriate > versions of Linux and Java. I am more interested in finding an optimum > answer for the following question: > > How do you handle wrong ordering of multiple writes (on same row and > column) during the leap second? You may overwrite the new value with old > one (disaster). > > And Downtime is no option :) > > I can see that CASSANDRA-9131 is still open.. > > FYI..we are on 2.0.14 .. > > > Thanks > Anuj > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Introducing Cassandra 3.7 LTS
Thanks Sankalp, we are also reviewing our internal 2.1 list against what you published (though we are trying to upgrade everyone to later versions e.g. 2.2). It's great to compare notes. On Thu, 20 Oct 2016 at 16:19 sankalp kohli <kohlisank...@gmail.com> wrote: > This is awesome. I have send out the patches which we back ported into 2.1 > on the dev list. > > On Wed, Oct 19, 2016 at 4:33 PM, kurt Greaves <k...@instaclustr.com> > wrote: > > > On 19 October 2016 at 21:07, sfesc...@gmail.com <sfesc...@gmail.com> > wrote: > > Wow, thank you for doing this. This sentiment regarding stability seems to > be widespread. Is the team reconsidering the whole tick-tock cadence? If > not, I would add my voice to those asking that it is revisited. > > > There has certainly been discussion regarding the tick-tock cadence, and > it seems safe to say it will change. There hasn't been any official > announcement yet, however. > > Kurt Greaves > k...@instaclustr.com > www.instaclustr.com > > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Introducing Cassandra 3.7 LTS
Hi All I am proud to announce we are making available our production build of Cassandra 3.7 that we run at Instaclustr (both for ourselves and our customers). Our release of Cassandra 3.7 includes a number of backported patches from later versions of Cassandra e.g. 3.8 and 3.9 but doesn't include the new features of these releases. You can find our release of Cassandra 3.7 LTS on github here ( https://github.com/instaclustr/cassandra). You can read more of our thinking and how this applies to our managed service here ( https://www.instaclustr.com/blog/2016/10/19/patched-cassandra-3-7/). We also have an expanded FAQ about why and how we are approaching 3.x in this manner (https://github.com/instaclustr/cassandra#cassandra-37-lts), however I've included the top few question and answers below: *Is this a fork?* No, This is just Cassandra with a different release cadence for those who want 3.x features but are slightly more risk averse than the current schedule allows. *Why not just use the official release?* With the 3.x tick-tock branch we have encountered more instability than with the previous release cadence. We feel that releasing new features every other release makes it very hard for operators to stabilize their production environment without bringing in brand new features that are not battle tested. With the release of Cassandra 3.8 and 3.9 simultaneously the bug fix branch included new and real-world untested features, specifically CDC. We have decided to stick with Cassandra 3.7 and instead backport critical issues and maintain it ourselves rather than trying to stick with the current Apache Cassandra release cadence. *Why backport?* At Instaclustr we support and run a number of different versions of Apache Cassandra on behalf of our customers. Over the course of managing Cassandra for our customers we often encounter bugs. There are existing patches for some of them, others we patch ourselves. Generally, if we can, we try to wait for the next official Apache Cassandra release, however in the need to ensure our customers remain stable and running we will sometimes backport bugs and write our own hotfixes (which are also submitted back to the community). *Why release it?* A number of our customers and people in the community have asked if we would make this available, which we are more than happy to do so. This repository represents what Instaclustr runs in production for Cassandra 3.7 and this is our way of helping the community get a similar level of stability as what you would get from our managed service. Cheers Ben -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: High system CPU during high write workload
Hi Abhishek The article with the futex bug description lists the solution, which is to upgrade to a version of RHEL or CentOS that have the specified patch. What help do you specifically need? If you need help upgrading the OS I would look at the documentation for RHEL or CentOS. Ben On Mon, 14 Nov 2016 at 22:48 Abhishek Gupta <gupta.abhis...@snapdeal.com> wrote: Hi, We are seeing an issue where the system CPU is shooting off to a figure or > 90% when the cluster is subjected to a relatively high write workload i.e 4k wreq/secs. 2016-11-14T13:27:47.900+0530 Process summary process cpu=695.61% application cpu=676.11% (*user=200.63% sys=475.49%) **<== Very High System CPU * other: cpu=19.49% heap allocation rate *403mb*/s [000533] user= 1.43% sys= 6.91% alloc= 2216kb/s - SharedPool-Worker-129 [000274] user= 0.38% sys= 7.78% alloc= 2415kb/s - SharedPool-Worker-34 [000292] user= 1.24% sys= 6.77% alloc= 2196kb/s - SharedPool-Worker-56 [000487] user= 1.24% sys= 6.69% alloc= 2260kb/s - SharedPool-Worker-79 [000488] user= 1.24% sys= 6.56% alloc= 2064kb/s - SharedPool-Worker-78 [000258] user= 1.05% sys= 6.66% alloc= 2250kb/s - SharedPool-Worker-41 On doing strace it was found that the following system call is consuming all the system CPU timeout 10s strace -f -p 5954 -c -q % time seconds usecs/call callserrors syscall -- --- --- - - *88.33 1712.798399 16674102723 22191 futex* 3.98 77.098730 4356 17700 read 3.27 63.474795 394253 16129 restart_syscall 3.23 62.601530 29768 2103 epoll_wait On searching we found the following bug with the RHEL 6.6, CentOS 6.6 kernel seems to be a probable cause for the issue: https://docs.datastax.com/en/landing_page/doc/landing_page/troubleshooting/cassandra/fetuxWaitBug.html The patch fix mentioned in the doc is also not present in our kernel. sudo rpm -q --changelog kernel-`uname -r` | grep futex | grep ref - [kernel] futex_lock_pi() key refcnt fix (Danny Feng) [566347] {CVE-2010-0623} Can some who has faced and resolved this issue help us here. Thanks, Abhishek -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Is it *safe* to issue multiple replace-node at the same time?
Same rack and no range movements, my first instinct is to say yes it is safe (I like to treat racks as one giant meta node). However I would want to have a read through the replace code. On Mon, Nov 21, 2016, 07:22 Dikang Gu <dikan...@gmail.com> wrote: > Hi guys, > > Sometimes we need to replace multiple hosts in the same rack, is it safe > to replace them in parallel, using the replace-node command? > > Will it cause any data inconsistency if we do so? > > Thanks > Dikang. > > -- > Dikang > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Clarify Support for 2.2 on Download Page
Hi Derek You should subscribe and post this question to the Dev list, they will be able to get you sorted quickly! Normally you can edit documentation directly via github (e.g. https://github.com/apache/cassandra/tree/trunk/doc/source), however the download source appears to be outside the Cassandra repo. Ben On Wed, 16 Nov 2016 at 13:08 Derek Burdick <derek.burd...@gmail.com> wrote: > Hi, is it possible to update the language on the Apache Cassandra Download > page to reflect that version 2.2 will enter Critical Fix Only support after > November 21st? > > The current language creates quite a bit of confusion in the community > with how long 2.2 and 2.1 will receive fixes from the community. > > http://cassandra.apache.org/download/ > > Specifically these three lines: > >- Apache Cassandra 3.0 is supported until May 2017. The latest release >is 3.0.9 > > <http://www.apache.org/dyn/closer.lua/cassandra/3.0.9/apache-cassandra-3.0.9-bin.tar.gz> > (pgp > > <http://www.apache.org/dist/cassandra/3.0.9/apache-cassandra-3.0.9-bin.tar.gz.asc> >, md5 > > <http://www.apache.org/dist/cassandra/3.0.9/apache-cassandra-3.0.9-bin.tar.gz.md5> > and sha1 > > <http://www.apache.org/dist/cassandra/3.0.9/apache-cassandra-3.0.9-bin.tar.gz.sha1>), >released on 2016-09-20. >- Apache Cassandra 2.2 is supported until November 2016. The latest >release is 2.2.8 > > <http://www.apache.org/dyn/closer.lua/cassandra/2.2.8/apache-cassandra-2.2.8-bin.tar.gz> > (pgp > > <http://www.apache.org/dist/cassandra/2.2.8/apache-cassandra-2.2.8-bin.tar.gz.asc> >, md5 > > <http://www.apache.org/dist/cassandra/2.2.8/apache-cassandra-2.2.8-bin.tar.gz.md5> > and sha1 > > <http://www.apache.org/dist/cassandra/2.2.8/apache-cassandra-2.2.8-bin.tar.gz.sha1>), >released on 2016-09-28. >- Apache Cassandra 2.1 is supported until November 2016 with critical >fixes only. The latest release is 2.1.16 > > <http://www.apache.org/dyn/closer.lua/cassandra/2.1.16/apache-cassandra-2.1.16-bin.tar.gz> > (pgp > > <http://www.apache.org/dist/cassandra/2.1.16/apache-cassandra-2.1.16-bin.tar.gz.asc> >, md5 > > <http://www.apache.org/dist/cassandra/2.1.16/apache-cassandra-2.1.16-bin.tar.gz.md5> > and sha1 > > <http://www.apache.org/dist/cassandra/2.1.16/apache-cassandra-2.1.16-bin.tar.gz.sha1>), >released on 2016-10-10. > > > What would be the best approach to help get this changed? > > -Derek > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Any Bulk Load on Large Data Set Advice?
+1 on parquet and S3. Combined with spark running on spot instances your grant money will go much further! On Thu, 17 Nov 2016 at 07:21 Jonathan Haddad <j...@jonhaddad.com> wrote: > If you're only doing this for spark, you'll be much better off using > parquet and HDFS or S3. While you *can* do analytics with cassandra, it's > not all that great at it. > On Thu, Nov 17, 2016 at 6:05 AM Joe Olson <technol...@nododos.com> wrote: > > I received a grant to do some analysis on netflow data (Local IP address, > Local Port, Remote IP address, Remote Port, time, # of packets, etc) using > Cassandra and Spark. The de-normalized data set is about 13TB out the door. > I plan on using 9 Cassandra nodes (replication factor=3) to store the data, > with Spark doing the aggregation. > > Data set will be immutable once loaded, and am using the replication > factor = 3 to somewhat simulate the real world. Most of the analysis will > be of the sort "Give me all the remote ip addresses for source IP 'X' > between time t1 and t2" > > I built and tested a bulk loader following this example in GitHub: > https://github.com/yukim/cassandra-bulkload-example to generate the > SSTables, but I have not executed it on the entire data set yet. > > Any advice on how to execute the bulk load under this configuration? Can > I generate the SSTables in parallel? Once generated, can I write the > SSTables to all nodes simultaneously? Should I be doing any kind of sorting > by the partition key? > > This is a lot of data, so I figured I'd ask before I pulled the trigger. > Thanks in advance! > > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Priority for cassandra nodes in cluster
+1 w/ Benjamin. However if you wish to make use of spare hardware capacity, look to something like mesos DC/OS or kubernetes. You can run multiple services across a fleet of hardware, but provision equal resources to Cassandra and have somewhat reliable hardware sharing mechanisms. On Sat, 12 Nov 2016 at 14:12 Jon Haddad <jonathan.had...@gmail.com> wrote: > Agreed w/ Benjamin. Trying to diagnose issues in prod will be a > nightmare. Keep your DB servers homogeneous. > > On Nov 12, 2016, at 1:52 PM, Benjamin Roth <benjamin.r...@jaumo.com> > wrote: > > 1. From a 15 year experience of running distributed Services: dont Mix > Services on machines if you don't have to. Dedicate each server to a single > task if you can afford it. It is easier to manage and reduces risks in case > of overload or failure > 2. You can assign a different number of tokens for each node by setting > this in Cassandra.yaml before you bootstrap that node > > Am 12.11.2016 22:48 schrieb "sat" <sathish.al...@gmail.com>: > > Hi, > > We are planning to install 3 node cluster in production environment. Is it > possible to provide weightage or priority to the nodes in cluster. > > Eg., We want more more records to be written to first 2 nodes and less to > the 3rd node. We are thinking of this approach because we want to install > other IO intensive messaging server in the 3rd node, in order to reduce the > load we are requesting for this approach. > > > Thanks and Regards > A.SathishKumar > > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Introducing Cassandra 3.7 LTS
We are not publishing the build artefacts for our LTS at the moment as we don't test them on the different distros (debian/ubuntu, centos etc). If anyone wishes to do so feel free to create a PR and submit them! On Wed, 2 Nov 2016 at 11:37 Jesse Hodges <hodges.je...@gmail.com> wrote: > awesome, thanks for the tip! > > -Jesse > > On Wed, Nov 2, 2016 at 12:39 PM, Benjamin Roth <benjamin.r...@jaumo.com> > wrote: > > You can build one on your own very easily. Just check out the desired git > repo and do this: > > > http://stackoverflow.com/questions/8989192/how-to-package-the-cassandra-source-code-into-debian-package > > 2016-11-02 17:35 GMT+01:00 Jesse Hodges <hodges.je...@gmail.com>: > > Just curious, has anybody created a debian package for this? > > Thanks, Jesse > > On Sat, Oct 22, 2016 at 7:45 PM, Kai Wang <dep...@gmail.com> wrote: > > This is awesome! Stability is the king. > > Thank you so much! > > On Oct 19, 2016 2:56 PM, "Ben Bromhead" <b...@instaclustr.com> wrote: > > Hi All > > I am proud to announce we are making available our production build of > Cassandra 3.7 that we run at Instaclustr (both for ourselves and our > customers). Our release of Cassandra 3.7 includes a number of backported > patches from later versions of Cassandra e.g. 3.8 and 3.9 but doesn't > include the new features of these releases. > > You can find our release of Cassandra 3.7 LTS on github here ( > https://github.com/instaclustr/cassandra). You can read more of our > thinking and how this applies to our managed service here ( > https://www.instaclustr.com/blog/2016/10/19/patched-cassandra-3-7/). > > We also have an expanded FAQ about why and how we are approaching 3.x in > this manner (https://github.com/instaclustr/cassandra#cassandra-37-lts), > however I've included the top few question and answers below: > > *Is this a fork?* > No, This is just Cassandra with a different release cadence for those who > want 3.x features but are slightly more risk averse than the current > schedule allows. > > *Why not just use the official release?* > With the 3.x tick-tock branch we have encountered more instability than > with the previous release cadence. We feel that releasing new features > every other release makes it very hard for operators to stabilize their > production environment without bringing in brand new features that are not > battle tested. With the release of Cassandra 3.8 and 3.9 simultaneously the > bug fix branch included new and real-world untested features, specifically > CDC. We have decided to stick with Cassandra 3.7 and instead backport > critical issues and maintain it ourselves rather than trying to stick with > the current Apache Cassandra release cadence. > > *Why backport?* > At Instaclustr we support and run a number of different versions of Apache > Cassandra on behalf of our customers. Over the course of managing Cassandra > for our customers we often encounter bugs. There are existing patches for > some of them, others we patch ourselves. Generally, if we can, we try to > wait for the next official Apache Cassandra release, however in the need to > ensure our customers remain stable and running we will sometimes backport > bugs and write our own hotfixes (which are also submitted back to the > community). > > *Why release it?* > A number of our customers and people in the community have asked if we > would make this available, which we are more than happy to do so. This > repository represents what Instaclustr runs in production for Cassandra 3.7 > and this is our way of helping the community get a similar level of > stability as what you would get from our managed service. > > Cheers > > Ben > > > > -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 > Managed Cassandra / Spark on AWS, Azure and Softlayer > > > > > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH · www.jaumo.com > Wehrstraße 46 · 73035 Göppingen · Germany > Phone +49 7161 304880-6 · Fax +49 7161 304880-1 > AG Ulm · HRB 731058 · Managing Director: Jens Kammerer > > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Handle Leap Seconds with Cassandra
Based on most of what I've said previously pretty much most ways of avoiding your ordering issue of the leap second is going to be a "hack" and there will be some amount of hope involved. If the updates occur more than 300ms apart and you are confident your nodes have clocks that are within 150ms of each other, then I'd close my eyes and hope they all leap second at the same time within that 150ms. If they are less then 300ms (I'm guessing you meant less 300ms), then I would look to figure out what the smallest gap is between those two updates and make sure your nodes clocks are close enough in that gap that the leap second will occur on all nodes within that gap. If that's not good enough, you could just halt those scenarios for 2 seconds over the leap second and then resume them once you've confirmed all clocks have skipped. On Wed, 2 Nov 2016 at 18:13 Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > Thanks Ben for taking out time for the detailed reply !! > > We dont need strict ordering for all operations but we are looking for > scenarios where 2 quick updates to same column of same row are possible. By > quick updates, I mean >300 ms. Configuring NTP properly (as mentioned in > some blogs in your link) should give fair relative accuracy between the > Cassandra nodes. But leap second takes the clock back for an ENTIRE one > sec (huge) and the probability of old write overwriting the new one > increases drastically. So, we want to be proactive with things. > > I agree that you should avoid such scebaruos with design (if possible). > > Good to know that you guys have setup your own NTP servers as per the > recommendation. Curious..Do you also do some monitoring around NTP? > > > > Thanks > Anuj > > On Fri, 28 Oct, 2016 at 12:25 AM, Ben Bromhead > > <b...@instaclustr.com> wrote: > If you need guaranteed strict ordering in a distributed system, I would > not use Cassandra, Cassandra does not provide this out of the box. I would > look to a system that uses lamport or vector clocks. Based on your > description of how your systems runs at the moment (and how close your > updates are together), you have either already experienced out of order > updates or there is a real possibility you will in the future. > > Sorry to be so dire, but if you do require causal consistency / strict > ordering, you are not getting it at the moment. Distributed systems theory > is really tricky, even for people that are "experts" on distributed systems > over unreliable networks (I would certainly not put myself in that > category). People have made a very good name for themselves by showing that > the vast majority of distributed databases have had bugs when it comes to > their various consistency models and the claims these databases make. > > So make sure you really do need guaranteed causal consistency/strict > ordering or if you can design around it (e.g. using conflict free > replicated data types) or choose a system that is designed to provide it. > > Having said that... here are some hacky things you could do in Cassandra > to try and get this behaviour, which I in no way endorse doing :) > >- Cassandra counters do leverage a logical clock per shard and you >could hack something together with counters and lightweight transactions, >but you would want to do your homework on counters accuracy during before >diving into it... as I don't know if the implementation is safe in the >context of your question. Also this would probably require a significant >rework of your application plus a significant performance hit. I would >invite a counter guru to jump in here... > > >- You can leverage the fact that timestamps are monotonic if you >isolate writes to a single node for a single shared... but you then loose >Cassandra's availability guarantees, e.g. a keyspace with an RF of 1 and a >CL of > ONE will get monotonic timestamps (if generated on the server >side). > > >- Continuing down the path of isolating writes to a single node for a >given shard you could also isolate writes to the primary replica using your >client driver during the leap second (make it a minute either side of the >leap), but again you lose out on availability and you are probably already >experiencing out of ordered writes given how close your writes and updates >are. > > > A note on NTP: NTP is generally fine if you use it to keep the clocks > synced between the Cassandra nodes. If you are interested in how we have > implemented NTP at Instaclustr, see our blogpost on it > https://www.instaclustr.com/blog/2015/11/05/apache-cassandra-synchronization/ > . > > > > Ben > > > On Thu, 27 Oct 2016 at 10:18 Anuj W
Re: Is there any way to throttle the memtable flushing throughput?
A few thoughts on the larger problem at hand. The AWS instance type you are using is not appropriate for a production workload. Also with memtable flushes that cause spiky write throughput it sounds like your commitlog is on the same disk as your data directory, combined with the use of non-SSD EBS I'm not surprised this is happening. The small amount of memory on the node could also mean your flush writers are getting backed up (blocked), possibly causing JVM heap pressure and other fun things (you can check this with nodetool tpstats). Before you get into tuning memtable flushing I would do the following: - Reset your commitlog_sync settings back to default - Use an EC2 instance type with at least 15GB of memory, 4 cores and is EBS optimized (dedicated EBS bandwidth) - Use gp2 or io2 EBS volumes - Put your commitlog on a separate EBS volume. - Make sure your memtable_flush_writers are not being blocked, if so increase the number of flush writers (no more than # of cores) - Optimize your read_ahead_kb size and compression_chunk_length to keep those EBS reads as small as possible. Once you have fixed the above, memtable flushing should not be an issue. Even if you can't/don't want to upgrade the instance type, the other steps will help things. Ben On Tue, 11 Oct 2016 at 10:23 Satoshi Hikida <sahik...@gmail.com> wrote: > Hi, > > I'm investigating the read/write performance of the C* (Ver. 2.2.8). > However, I have an issue about memtable flushing which forces the spiky > write throughput. And then it affects the latency of the client's requests. > > So I want to know the answers for the following questions. > > 1. Is there any way that throttling the write throughput of the memtable > flushing? If it exists, how can I do that? > 2. Is there any way to reduce the spike of the write bandwidth during the > memtable flushing? >(I'm in trouble because the delay of the request increases when the > spike of the write bandwidth occurred) > > I'm using one C* node for this investigation. And C* runs on an EC2 > instance (2vCPU, 4GB memory), In addition, I attach two magnetic disks to > the instance, one stores system data(root file system.(/)), the other > stores C* data (data files and commit logs). > > I also changed a few configurations. > - commitlog_sync: batch > - commitlog_sync_batch_window_in_ms: 2 > (Using default value for the other configurations) > > > Regards, > Satoshi > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Does increment/decrement by 0 generate any commits ?
According to https://issues.apache.org/jira/browse/CASSANDRA-7304 unset values in a prepared statement for a counter does not change the value of the counter. This applies for versions of Cassandra 2.2 and above. I would also look to verify the claimed behaviour myself. On Tue, 11 Oct 2016 at 09:49 Dorian Hoxha <dorian.ho...@gmail.com> wrote: > I just have a bunch of counters in 1 row, and I want to selectively update > them. And I want to keep prepared queries. But I don't want to keep 30 > prepared queries (1 for each counter column, but keep only 1). So in most > cases, I will increment 1 column by positive integer and the others by 0. > > Makes sense ? > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Adding disk capacity to a running node
ne.com> > wrote: > > > > Yes, Cassandra should keep percent of disk usage equal for all disk. > Compaction process and SSTable flushes will use new disk to distribute both > new and existing data. > > > > Best regards, Vladimir Yudovin, > > > *Winguzone > <https://urldefense.proofpoint.com/v2/url?u=https-3A__winguzone.com-3Ffrom-3Dlist=DQMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=ixOxpX-xpw1dJZNpaMT3mepToWX8gzmsVaXFizQLzoU=4q7P9fddEYpXwPR-h9yA_tk5JwR8l6c7cKJ-LQTVcGM=> > - Hosted Cloud Cassandra on Azure and SoftLayer.Launch your cluster in > minutes.* > > > > > > On Mon, 17 Oct 2016 11:43:27 -0400*Seth Edwards <s...@pubnub.com > <s...@pubnub.com>>* wrote > > > > We have a few nodes that are running out of disk capacity at the moment > and instead of adding more nodes to the cluster, we would like to add > another disk to the server and add it to the list of data directories. My > question, is, will Cassandra use the new disk for compactions on sstables > that already exist in the primary directory? > > > > > > > > Thanks! > > > > > > > > CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and > may be legally privileged. If you are not the intended recipient, do not > disclose, copy, distribute, or use this email or any attachments. If you > have received this in error please let the sender know and then delete the > email and all attachments. > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer
Re: Adding disk capacity to a running node
yup you would need to copy the files across to the new volume from the dir you wanted to give additional space to. Rough steps would look like: 1. Create EBS volume (make it big... like 3TB) 2. Attach to instance 3. Mount/format EBS volume 4. Stop C* 5. Copy full/troublesome directory to the EBS volume 6. Remove copied files (using rsync for the copy / remove step can be a good idea) 7. bind mount EBS volume with the same path as the troublesome directory 8. Start C* back up 9. Let it finish compacting / streaming etc 10. Stop C* 11. remove bind mount 12. copy files back to ephemeral 13. start C* back up 14. repeat on other nodes 15. run repair You can use this process if you somehow end up in a full disk situation. If you end up in a low disk situation you'll have other issues (like corrupt / half written SSTable components), but it's better than nothing Also to maintain your read throughput during this whole thing, double check the EBS volumes read_ahead_kb setting on the block volume and reduce it to something sane like 0 or 16. On Mon, 17 Oct 2016 at 13:42 Seth Edwards <s...@pubnub.com> wrote: > @Ben > > Interesting idea, is this also an option for situations where the disk is > completely full and Cassandra has stopped? (Not that I want to go there). > > If this was the route taken, and we did > > mount --bind /mnt/path/to/large/sstable /mnt/newebs > > We would still need to do some manual copying of files? such as > > mv /mnt/path/to/large/sstable.sd /mnt/newebs ? > > Thanks! > > On Mon, Oct 17, 2016 at 12:59 PM, Ben Bromhead <b...@instaclustr.com> > wrote: > > Yup as everyone has mentioned ephemeral are fine if you run in multiple > AZs... which is pretty much mandatory for any production deployment in AWS > (and other cloud providers) . i2.2xls are generally your best bet for high > read throughput applications on AWS. > > Also on AWS ephemeral storage will generally survive a user initiated > restart. For the times that AWS retires an instance, you get plenty of > notice and it's generally pretty rare. We run over 1000 instances on AWS > and see one forced retirement a month if that. We've never had an instance > pulled from under our feet without warning. > > To add another option for the original question, one thing you can do is > to attach a large EBS drive to the instance and bind mount it to the > directory for the table that has the very large SSTables. You will need to > copy data across to the EBS volume. Let everything compact and then copy > everything back and detach EBS. Latency may be higher than normal on the > node you are doing this on (especially if you are used to i2.2xl > performance). > > This is something we often have to do, when we encounter pathological > compaction situations associated with bootstrapping, adding new DCs or STCS > with a dominant table or people ignore high disk usage warnings :) > > On Mon, 17 Oct 2016 at 12:43 Jeff Jirsa <jeff.ji...@crowdstrike.com> > wrote: > > Ephemeral is fine, you just need to have enough replicas (in enough AZs > and enough regions) to tolerate instances being terminated. > > > > > > > > *From: *Vladimir Yudovin <vla...@winguzone.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Monday, October 17, 2016 at 11:48 AM > *To: *user <user@cassandra.apache.org> > > > *Subject: *Re: Adding disk capacity to a running node > > > > It's extremely unreliable to use ephemeral (local) disks. Even if you > don't stop instance by yourself, it can be restarted on different server in > case of some hardware failure or AWS initiated update. So all node data > will be lost. > > > > Best regards, Vladimir Yudovin, > > > *Winguzone > <https://urldefense.proofpoint.com/v2/url?u=https-3A__winguzone.com-3Ffrom-3Dlist=DQMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=ixOxpX-xpw1dJZNpaMT3mepToWX8gzmsVaXFizQLzoU=4q7P9fddEYpXwPR-h9yA_tk5JwR8l6c7cKJ-LQTVcGM=> > - Hosted Cloud Cassandra on Azure and SoftLayer.Launch your cluster in > minutes.* > > > > > > On Mon, 17 Oct 2016 14:45:00 -0400*Seth Edwards <s...@pubnub.com > <s...@pubnub.com>>* wrote > > > > These are i2.2xlarge instances so the disks currently configured as > ephemeral dedicated disks. > > > > On Mon, Oct 17, 2016 at 11:34 AM, Laing, Michael < > michael.la...@nytimes.com> wrote: > > > > You could just expand the size of your ebs volume and extend the file > system. No data is lost - assuming you are running Linux. > > > > > > On Monday, October 17, 2016, Seth Edwards
Re: Why does `now()` produce different times within the same query?
> > > > I will note that Ben seems to suggest keeping the return of now() unique > across > call while keeping the time component equals, thus varying the rest of the > uuid > bytes. However: > - I'm starting to wonder what this would buy us. Why would someone be > super >confused by the time changing across calls (in a single > statement/batch), but >be totally not confused by the actual full return to not be equal? > Given that a common way of interacting with timeuuids is with toTimestamp I can see the confusion and assumptions around behaviour. And how is >that actually useful: you're having different result anyway and you're >letting the server pick the timestamp in the first place, so you're > probably >not caring about milliseconds precision of that timestamp in the first > place. > If you want consistency of timestamps within your query as OP did I can see how this is useful. Postgres claims this is a "feature". - This would basically be a violation of the timeuuid spec > Not quite... Type 1 uuids let you swap out the low 47 bits of the node component with other randomly generated bits ( https://www.ietf.org/rfc/rfc4122.txt) - This would be a big pain in the code and make of now() a special case > among functions. I'm unconvinced special cases are making things easier > in general. > On reflection, I have to agree here, now() has been around for ever and this is the first anecdote I've seen of someone getting caught out. However with my user advocate hat on I think it would be worth investigating further beyond a documentation update if others found it a sticking point in Cassandra adoption. > So I'm all for improving the documentation if this confuses users due to > expectations (mistakenly) carried from prior experiences, and please > feel free to open a JIRA for that. I'm a lot less in agreement that there > is > something wrong with the way the function behave in principle. > > > I can see why this issue has been largely ignored and hasn't had a > chance for > > the behaviour to be formally defined > > Don't make too much assumptions. The behavior is perfectly well defined: > now() > is a "normal" function and is evaluated whenever it's called according to > the > timeuuid spec (or as close to it as we can make it). > Maybe formally defined is the wrong term... Formally documented? > > On Thu, Dec 1, 2016 at 7:25 AM, Benjamin Roth <benjamin.r...@jaumo.com> > wrote: > > Great comment. +1 > > Am 01.12.2016 06:29 schrieb "Ben Bromhead" <b...@instaclustr.com>: > > tl;dr +1 yup raise a jira to discuss how now() should behave in a single > statement (and possible extend to batch statements). > > The values of now should be the same if you assume that now() works like > it does in relational databases such as postgres or mysql, however at the > moment it instead works like sysdate() in mysql. Given that CQL is supposed > to be SQL like, I think the assumption around the behaviour of now() was a > fair one to make. > > I definitely agree that raising a jira ticket would be a great place to > discuss what the behaviour of now() should be for Cassandra. Personally I > would be in favour of seeing the deterministic component (the actual time > part) being the same across multiple calls in the one statement or multiple > statements in a batch. > > Cassandra documentation does not make any claims as to how now() works > within a single statement and reading the code it shows the intent is to > work like sysdate() from MySQL rather than now(). One of the identified > dangers of making cql similar to sql is that, while yes it aids adoption, > users will find that SQL like things don't behave as expected. Of course as > a user, one shouldn't have to read the source code to determine correct > behaviour. > > Given that a timeuuid is made up of deterministic and (pseudo) > non-deterministic components I can see why this issue has been largely > ignored and hasn't had a chance for the behaviour to be formally defined > (you would expect now to return the same time in the one statement despite > multiple calls, but you wouldn't expect the same behaviour for say a call > to rand()). > > > > > > > > On Wed, 30 Nov 2016 at 19:54 Cody Yancey <yan...@uber.com> wrote: > > This is not a bug, and in fact changing it would be a serious bug. > > False. Absolutely no consumer would be broken by a change to guarantee an > identical time component that isn't broken already, for the simple reason > your code already has to handle that case, as it is in fact the majority > case RIGHT NOW. Users can hit this bug, in production, because unit tests > might