from:"Ben Bromhead"

Re: trouble setting up initial cluster: Host ID collision between active endpoint

2013-01-24 Thread Ben Bromhead

Hi Tim

If you want to check out Cassandra on AWS you should also have a look
www.instaclustr.com.

We are still very much in Beta (so if you come across anything, please let
us know), but if you have a few minutes and want to deploy a cluster in
just a few clicks I highly recommend trying Instaclustr out.

Cheers

Ben Bromhead
*Instaclustr*

On Fri, Jan 25, 2013 at 12:35 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Cool Thanks for the advice Aaron. I actually did get this working before I
 read your reply. The trick apparently for me was to use the IP for the
 first node in the seeds setting of each successive node. But I like the
 idea of using larges for an hour or so and terminating them for some basic
 experimentation.  Also, thanks for pointing me to the Datastax AMIs I'll be
 sure to check them out.

 Tim


 On Thu, Jan 24, 2013 at 3:45 AM, aaron morton aa...@thelastpickle.comwrote:

 They both have 0 for their token, and this is stored in their System
 keyspace.
 Scrub them and start again.

 But I found that the tokens that were being generated would require way
 too much memory

 Token assignments have nothing to do with memory usage.

  m1.micro instances

 You are better off using your laptop than micro instances.
 For playing around try m1.large and terminate them when not in use.
 To make life easier use this to make the cluster for you
 http://www.datastax.com/docs/1.2/install/install_ami

 Cheers

-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 24/01/2013, at 5:17 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hello list,

  I really do appreciate the advice I've gotten here as I start building
 familiarity with Cassandra. Aside from the single node instance I setup for
 a developer friend, I've just been playing with a single node in a VM on my
 laptop and playing around with the cassandra-cli and PHP.

 Well I've decided to setup my first cluster on my amazon ec2 account and
 I'm running into an issue getting the nodes to gossip.

 I've set the IP's of 'node01' and 'node02' ec2 instances in their
 respective listen_address, rpc_address and made sure that the
 'cluster_name' on both was in agreement.

  I believe the problem may be in one of two places: either the seeds or
 the initial_token setting.

 For the seeds I have it setup as such. I put the IPs for both machines in
 the 'seeds' settings for each, thinking this would be how each node would
 discover each other:

  - seeds: 10.xxx.xxx.248,10.xxx.xxx.123

 Initially I tried the tokengen script that I found in the documentation.
 But I found that the tokens that were being generated would require way too
 much memory for the m1.micro instances that I'm experimenting with on the
 Amazon free tier. And according to the docs in the config it is in some
 cases ok to leave that field blank. So that's what I did on both instances.

 Not sure how much/if this matters but I am using the setting -
 endpoint_snitch: Ec2Snitch

 Finally, when I start up the first node all goes well.

 But when I startup the second node I see this exception on both hosts:

 node1

 INFO 11:02:32,231 Listening for thrift clients...
  INFO 11:02:59,262 Node /10.xxx.xxx.123 is now part of the cluster
  INFO 11:02:59,268 InetAddress /10.xxx.xxx.123 is now UP
 ERROR 11:02:59,270 Exception in thread Thread[GossipStage:1,5,main]
 java.lang.RuntimeException: Host ID collision between active endpoint
 /10..xxx.248 and /10.xxx.xxx.123
 (id=54ce7ccd-1b1d-418e-9861-1c281c078b8f)
 at
 org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:227)
 at
 org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1296)
 at
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:1157)
 at
 org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1895)
 at
 org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:805)
 at
 org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:883)
 at
 org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:43)
 at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
 at java.lang.Thread.run(Unknown Source)

 And on node02 I see:

  INFO 11:02:58,817 Starting Messaging Service on port 7000
  INFO 11:02:58,835 Using saved token [0]
  INFO 11:02:58,837 Enqueuing flush of Memtable-local@672636645(84/84
 serialized/live bytes, 4 ops)
  INFO 11:02:58,838 Writing Memtable-local@672636645(84/84
 serialized/live bytes, 4 ops)
  INFO 11:02:58,912 Completed flushing
 /var/lib/cassandra/data/system/local/system-local-ia-43-Data.db (120 bytes)
 for commitlog position ReplayPosition(segmentId

Re: no other nodes seen on priam cluster

2013-02-26 Thread Ben Bromhead

Hi Marcelo

A few questions:

Have your added the priam java agent to cassandras JVM argurments (e.g.
-javaagent:$CASS_HOME/lib/priam-cass-extensions-1.1.15.jar)  and does the
web container running priam have permissions to write to the cassandra
config directory? Also what do the priam logs say?

If you want to get up and running quickly with cassandra, AWS and priam
quickly check out
www.instaclustr.comhttp://www.instaclustr.com/?cid=cass-listyou.
We deploy Cassandra under your AWS account and you have full root access to
the nodes if you want to explore and play around + there is a free tier
which is great for experimenting and trying Cassandra out.

Cheers

Ben

On Wed, Feb 27, 2013 at 6:09 AM, Marcelo Elias Del Valle mvall...@gmail.com
 wrote:

 Hello,

  I am using cassandra 1.2.1 and I am trying to set up a Priam cluster
 on AWS with two nodes. However, I can't get both nodes up and running
 because of a weird error (at least to me).
  When I start both nodes, they are both able to connect to each other
 and do some communication. However, after some seconds, I just see
 Java.lang.RuntimeException: No other nodes seen! , so they disconnect and
 die. I tried to test all ports (7000, 9160 and  7199) between both nodes
 and there is no firewall. On the second node, before the above exception, I
 get a broken pipe, as shown bellow.
   Any hint?

 DEBUG 18:54:31,776 attempting to connect to /10.224.238.170
 DEBUG 18:54:32,402 Reseting version for /10.224.238.170
 DEBUG 18:54:32,778 Connection version 6 from /10.224.238.170
 DEBUG 18:54:32,779 Upgrading incoming connection to be compressed
 DEBUG 18:54:32,779 Max version for /10.224.238.170 is 6
 DEBUG 18:54:32,779 Setting version 6 for /10.224.238.170
 DEBUG 18:54:32,780 set version for /10.224.238.170 to 6
 DEBUG 18:54:33,455 Disseminating load info ...
 DEBUG 18:54:59,082 Reseting version for /10.224.238.170
 DEBUG 18:55:00,405 error writing to /10.224.238.170
 java.io.IOException: Broken pipe
  at sun.nio.ch.FileDispatcher.write0(Native Method)
 at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
  at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:72)
 at sun.nio.ch.IOUtil.write(IOUtil.java:43)
  at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
 at java.nio.channels.Channels.writeFullyImpl(Channels.java:59)
  at java.nio.channels.Channels.writeFully(Channels.java:81)
 at java.nio.channels.Channels.access$000(Channels.java:47)
  at java.nio.channels.Channels$1.write(Channels.java:155)
 at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
  at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 at org.xerial.snappy.SnappyOutputStream.flush(SnappyOutputStream.java:272)
  at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at
 org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:189)
  at
 org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:143)
 DEBUG 18:55:01,405 attempting to connect to /10.224.238.170
 DEBUG 18:55:01,461 Started replayAllFailedBatches
 DEBUG 18:55:01,462 forceFlush requested but everything is clean in batchlog
 DEBUG 18:55:01,463 Finished replayAllFailedBatches
  INFO 18:55:01,472 JOINING: schema complete, ready to bootstrap
 DEBUG 18:55:01,473 ... got ring + schema info
  INFO 18:55:01,473 JOINING: getting bootstrap token
 ERROR 18:55:01,475 Exception encountered during startup
 java.lang.RuntimeException: No other nodes seen!  Unable to bootstrap.If
 you intended to start a single-node cluster, you should make sure your
 broadcast_address (or listen_address) is listed as a seed.  Otherwise, you
 need to determine why the seed being contacted has no knowledge of the rest
 of the cluster.  Usually, this can be solved by giving all nodes the same
 seed list.


 and on the first node:

 DEBUG 18:54:30,833 Disseminating load info ...
 DEBUG 18:54:31,532 Connection version 6 from /10.242.139.159
 DEBUG 18:54:31,533 Upgrading incoming connection to be compressed
 DEBUG 18:54:31,534 Max version for /10.242.139.159 is 6
 DEBUG 18:54:31,534 Setting version 6 for /10.242.139.159
 DEBUG 18:54:31,534 set version for /10.242.139.159 to 6
 DEBUG 18:54:31,542 Reseting version for /10.242.139.159
 DEBUG 18:54:31,791 Connection version 6 from /10.242.139.159
 DEBUG 18:54:31,792 Upgrading incoming connection to be compressed
 DEBUG 18:54:31,792 Max version for /10.242.139.159 is 6
 DEBUG 18:54:31,792 Setting version 6 for /10.242.139.159
 DEBUG 18:54:31,793 set version for /10.242.139.159 to 6
  INFO 18:54:32,414 Node /10.242.139.159 is now part of the cluster
 DEBUG 18:54:32,415 Resetting pool for /10.242.139.159
 DEBUG 18:54:32,415 removing expire time for endpoint : /10.242.139.159
  INFO 18:54:32,415 InetAddress /10.242.139.159 is now UP
 DEBUG 18:54:32,789 attempting to connect to
 ec2-75-101-233-115.compute-1.amazonaws.com/10.242.139.159
 DEBUG 18:54:58,840 Started replayAllFailedBatches
 DEBUG

Re: no other nodes seen on priam cluster

2013-02-27 Thread Ben Bromhead

Off the top of my head I would check to make sure the Autoscaling Group you
created is restricted to a single Availability Zone, also Priam sets the
number of EC2 instances it expects based on the maximum instance count you
set on your scaling group (it did this last time i checked a few months
ago, it's behaviour may have changed).

So I would make your desired, min and max instances for your scaling group
are all the same, make sure your ASG is restricted to a
single availability zone (e.g. us-east-1b) and then (if you are able to and
there is no data in your cluster) delete all the SimpleDB entries Priam has
created and then also possibly clear out the cassandra data directory.

Other than that I see you've raised it as an issue on the Priam project
page , so see what they say ;)

Cheers

Ben

On Thu, Feb 28, 2013 at 3:40 AM, Marcelo Elias Del Valle mvall...@gmail.com
 wrote:

 One additional important info, I checked here and the seeds seems really
 different on each node. The command
 echo `curl 
 http://127.0.0.1:8080/Priam/REST/v1/cassconfig/get_seeds`http://127.0.0.1:8080/Priam/REST/v1/cassconfig/get_seeds
 returns ip2 on first node and ip1,ip1 on second node.
 Any idea why? It's probably what is causing cassandra to die, right?


 2013/2/27 Marcelo Elias Del Valle mvall...@gmail.com

 Hello Ben, Thanks for the willingness to help,

 2013/2/27 Ben Bromhead b...@instaclustr.com

 Have your added the priam java agent to cassandras JVM argurments (e.g.
 -javaagent:$CASS_HOME/lib/priam-cass-extensions-1.1.15.jar)  and does
 the web container running priam have permissions to write to the cassandra
 config directory? Also what do the priam logs say?


 I put the priam log of the first node bellow. Yes, I have added
 priam-cass-extensions to java args and Priam IS actually writting to
 cassandra dir.


 If you want to get up and running quickly with cassandra, AWS and priam
 quickly check out 
 www.instaclustr.comhttp://www.instaclustr.com/?cid=cass-listyou.
 We deploy Cassandra under your AWS account and you have full root access
 to the nodes if you want to explore and play around + there is a free tier
 which is great for experimenting and trying Cassandra out.


 That sounded really great. I am not sure if it would apply to our case
 (will consider it though), but some partners would have a great benefit
 from it, for sure! I will send your link to them.

 What priam says:

 2013-02-27 14:14:58.0614 INFO pool-2-thread-1
 com.netflix.priam.utils.SystemUtils Calling URL API:
 http://169.254.169.254/latest/meta-data/public-hostname returns:
 ec2-174-129-59-107.compute-1.amazon
 aws.com
 2013-02-27 14:14:58.0615 INFO pool-2-thread-1
 com.netflix.priam.utils.SystemUtils Calling URL API:
 http://169.254.169.254/latest/meta-data/public-ipv4 returns:
 174.129.59.107
 2013-02-27 14:14:58.0618 INFO pool-2-thread-1
 com.netflix.priam.utils.SystemUtils Calling URL API:
 http://169.254.169.254/latest/meta-data/instance-id returns: i-88b32bfb
 2013-02-27 14:14:58.0618 INFO pool-2-thread-1
 com.netflix.priam.utils.SystemUtils Calling URL API:
 http://169.254.169.254/latest/meta-data/instance-type returns: c1.medium
 2013-02-27 14:14:59.0614 INFO pool-2-thread-1
 com.netflix.priam.defaultimpl.PriamConfiguration REGION set to us-east-1,
 ASG Name set to dmp_cluster-useast1b
 2013-02-27 14:14:59.0746 INFO pool-2-thread-1
 com.netflix.priam.defaultimpl.PriamConfiguration appid used to fetch
 properties is: dmp_cluster
 2013-02-27 14:14:59.0843 INFO pool-2-thread-1
 org.quartz.simpl.SimpleThreadPool Job execution threads will use class
 loader of thread: pool-2-thread-1
 2013-02-27 14:14:59.0861 INFO pool-2-thread-1
 org.quartz.core.SchedulerSignalerImpl Initialized Scheduler Signaller of
 type: class org.quartz.core.SchedulerSignalerImpl
 2013-02-27 14:14:59.0862 INFO pool-2-thread-1
 org.quartz.core.QuartzScheduler Quartz Scheduler v.1.7.3 created.
 2013-02-27 14:14:59.0864 INFO pool-2-thread-1
 org.quartz.simpl.RAMJobStore RAMJobStore initialized.
 2013-02-27 14:14:59.0864 INFO pool-2-thread-1
 org.quartz.impl.StdSchedulerFactory Quartz scheduler
 'DefaultQuartzScheduler' initialized from default resource file in Quartz
 package: 'quartz.propertie
 s'
 2013-02-27 14:14:59.0864 INFO pool-2-thread-1
 org.quartz.impl.StdSchedulerFactory Quartz scheduler version: 1.7.3
 2013-02-27 14:14:59.0864 INFO pool-2-thread-1
 org.quartz.core.QuartzScheduler JobFactory set to:
 com.netflix.priam.scheduler.GuiceJobFactory@1b6a1c4
 2013-02-27 14:15:00.0239 INFO pool-2-thread-1
 com.netflix.priam.aws.AWSMembership Querying Amazon returned following
 instance in the ASG: us-east-1b -- i-8eb32bfd,i-88b32bfb
 2013-02-27 14:15:01.0470 INFO Timer-0 org.quartz.utils.UpdateChecker New
 update(s) found: 1.8.5 [
 http://www.terracotta.org/kit/reflector?kitID=defaultpageID=QuartzChangeLog
 ]
 2013-02-27 14:15:10.0925 INFO pool-2-thread-1
 com.netflix.priam.identity.InstanceIdentity Found dead instances: i-d49a0da7
 2013-02-27 14:15:11.0397

Re: no other nodes seen on priam cluster

2013-03-03 Thread Ben Bromhead

Glad you got it going!

There is a REST call you can make to priam telling it to double the cluster 
size (/v1/cassconfig/double_ring), it will pre fill all SimpleDB entries for 
when the nodes come online, you then change the number of nodes on the 
autoscale group. Now that Priam supports C* 1.2 with Vnodes, increasing the 
cluster size in an ad-hoc manner might be just around the corner.

Instacluster has some predefined cluster sizes (Free, Basic, Professional and 
Enterprise), these are loosely based on the estimated performance and storage 
capacity. 

You can also create a custom cluster where you define the number of nodes 
(minimum of 4) and the Instance type according to your requirements. For 
pricing on those check out https://www.instaclustr.com/pricing/per-instance, we 
base our pricing on estimated support and throughput requirements.

Cheers

Ben
Instaclustr | www.instaclustr.com | @instaclustr



On 02/03/2013, at 3:59 AM, Marcelo Elias Del Valle mvall...@gmail.com wrote:

 Thanks a lot Ben, actually I managed to make it work erasing the SimpleDB 
 Priam uses to keeps instances... I would pulled the last commit from the 
 repo, not sure if it helped or not.
 
 But you message made me curious about something...  How do you do to add more 
 Cassandra nodes on the fly? Just update the autoscale properties? I saw 
 instaclustr.com changes the instance type as the number of nodes increase 
 (not sure why the price also becomes higher per instance in this case), I am 
 guessing priam use the data backed up to S3 to restore a node data in another 
 instance, right?
 
 []s
 
 
 
 2013/2/28 Ben Bromhead b...@relational.io
 Off the top of my head I would check to make sure the Autoscaling Group you 
 created is restricted to a single Availability Zone, also Priam sets the 
 number of EC2 instances it expects based on the maximum instance count you 
 set on your scaling group (it did this last time i checked a few months ago, 
 it's behaviour may have changed). 
 
 So I would make your desired, min and max instances for your scaling group 
 are all the same, make sure your ASG is restricted to a single availability 
 zone (e.g. us-east-1b) and then (if you are able to and there is no data in 
 your cluster) delete all the SimpleDB entries Priam has created and then also 
 possibly clear out the cassandra data directory. 
 
 Other than that I see you've raised it as an issue on the Priam project page 
 , so see what they say ;)
 
 Cheers
 
 Ben
 
 On Thu, Feb 28, 2013 at 3:40 AM, Marcelo Elias Del Valle mvall...@gmail.com 
 wrote:
 One additional important info, I checked here and the seeds seems really 
 different on each node. The command
 echo `curl http://127.0.0.1:8080/Priam/REST/v1/cassconfig/get_seeds`
 returns ip2 on first node and ip1,ip1 on second node.
 Any idea why? It's probably what is causing cassandra to die, right?
 
 
 2013/2/27 Marcelo Elias Del Valle mvall...@gmail.com
 Hello Ben, Thanks for the willingness to help, 
 
 2013/2/27 Ben Bromhead b...@instaclustr.com
 Have your added the priam java agent to cassandras JVM argurments (e.g. 
 -javaagent:$CASS_HOME/lib/priam-cass-extensions-1.1.15.jar)  and does the web 
 container running priam have permissions to write to the cassandra config 
 directory? Also what do the priam logs say?
 
 I put the priam log of the first node bellow. Yes, I have added 
 priam-cass-extensions to java args and Priam IS actually writting to 
 cassandra dir.
  
 If you want to get up and running quickly with cassandra, AWS and priam 
 quickly check out www.instaclustr.com you. 
 We deploy Cassandra under your AWS account and you have full root access to 
 the nodes if you want to explore and play around + there is a free tier which 
 is great for experimenting and trying Cassandra out.
 
 That sounded really great. I am not sure if it would apply to our case (will 
 consider it though), but some partners would have a great benefit from it, 
 for sure! I will send your link to them.
 
 What priam says:
 
 2013-02-27 14:14:58.0614 INFO pool-2-thread-1 
 com.netflix.priam.utils.SystemUtils Calling URL API: 
 http://169.254.169.254/latest/meta-data/public-hostname returns: 
 ec2-174-129-59-107.compute-1.amazon
 aws.com
 2013-02-27 14:14:58.0615 INFO pool-2-thread-1 
 com.netflix.priam.utils.SystemUtils Calling URL API: 
 http://169.254.169.254/latest/meta-data/public-ipv4 returns: 174.129.59.107
 2013-02-27 14:14:58.0618 INFO pool-2-thread-1 
 com.netflix.priam.utils.SystemUtils Calling URL API: 
 http://169.254.169.254/latest/meta-data/instance-id returns: i-88b32bfb
 2013-02-27 14:14:58.0618 INFO pool-2-thread-1 
 com.netflix.priam.utils.SystemUtils Calling URL API: 
 http://169.254.169.254/latest/meta-data/instance-type returns: c1.medium
 2013-02-27 14:14:59.0614 INFO pool-2-thread-1 
 com.netflix.priam.defaultimpl.PriamConfiguration REGION set to us-east-1, ASG 
 Name set to dmp_cluster-useast1b
 2013-02-27 14:14:59.0746 INFO pool-2-thread-1

Re: Cassandra instead of memcached

2013-03-05 Thread Ben Bromhead

Check out 
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

Netflix used Cassandra with SSDs and were able to drop their memcache layer. 
Mind you they were not using it purely as an in memory KV store.

Ben
Instaclustr | www.instaclustr.com | @instaclustr



On 05/03/2013, at 4:33 PM, Drew Kutcharian d...@venarc.com wrote:

 Hi Guys,
 
 I'm thinking about using Cassandra as an in-memory key/value store instead of 
 memcached for a new project (just to get rid of a dependency if possible). I 
 was thinking about setting the replication factor to 1, enabling off-heap 
 row-cache and setting gc_grace_period to zero for the CF that will be used 
 for the key/value store.
 
 Has anyone tried this? Any comments?
 
 Thanks,
 
 Drew

Re: Using an EC2 cluster from the outside.

2013-04-17 Thread Ben Bromhead

Depending on your client, disable automatic client discovery and just specify a 
list of all your nodes in your client configuration.

For more details check out 
http://xzheng.net/blogs/problem-when-connecting-to-cassandra-with-ruby/ , 
obviously this deals specifically with a ruby client but it should be 
applicable to others.

Cheers

Ben
Instaclustr | www.instaclustr.com | @instaclustr



On 18/04/2013, at 5:43 AM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Apr 17, 2013 at 12:07 PM, maillis...@gmail.com wrote:
 I have a working 3 node cluster in a single ec2 region and I need to hit it 
 from our datacenter. As you'd expect, the client gets the internal addresses 
 of the nodes back. 
 
 Someone on irc mentioned using the public IP for rpc and binding that address 
 to the box. I see that mentioned in an old list mail but I don't get exactly 
 how this is supposed to work. I could really use either a link to something 
 with explicit directions or a detailed explanation. 
 
 Should cassandra use the public IPs for everything -- listen, b'cast, and 
 rpc? What should cassandra.yaml look like? Is the idea to use the public 
 addresses for cassandra but route the requests between nodes over the lan 
 using nat? 
 
 Any help or suggestion is appreciated. 
 
 Google EC2MultiRegionSnitch.
 
 =Rob

Re: Installing specific version

2013-07-07 Thread Ben Bromhead

On ubuntu it is: apt-get install cassandra=1.2.4

So should be similar for debian


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359




On 05/07/2013, at 10:59 PM, Kais Ahmed k...@neteck-fr.com wrote:

 Hi ben,
 
 You can get it from http://archive.apache.org/dist/cassandra/
 
 
 2013/7/5 Ben Gambley ben.gamb...@intoscience.com
 Hi all
 
 Can anyone point me in the right direction for  installing a specific version 
 from datastax repo, we need 1.2.4 to keep consistent with our qa environment.
 
 It's for a new prod cluster , on Debian 6.
 
 I thought it may be a value in /etc/apt/source.list ?
 
 The latest 1.2.6 does not appear compatible with our phpcassa thrift drivers.
 
 After many late nights my google ability seems to have evaporated!
 
 Cheers
 Ben

Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-04 Thread Ben Bromhead

If you want to get a rough idea of how things will perform, fire up YCSB 
(https://github.com/brianfrankcooper/YCSB/wiki) and run the tests that closest 
match how you think your workload will be (run the test clients from a couple 
of beefy AWS spot-instances for less than a dollar). As you are a new startup 
without any existing load/traffic patterns, benchmarking will be your best bet.

As a have a look at running Cassandra with SmartOS on Joyent. When you run 
SmartOS on Joyent virtualisation is done using solaris zones, an OS based 
virtualisation, which is at least a quadrillion times better than KVM, xen etc. 

Ok maybe not that much… but it is pretty cool and has the following benefits:

- No hardware emulation.
- Shared kernel with the host (you don't have to waste precious memory running 
a guest os).
- ZFS :)

Have a read of http://wiki.smartos.org/display/DOC/SmartOS+Virtualization for 
more info.

There are some downsides as well:

The version of Cassandra that comes with the SmartOS package management system 
is old and busted, so you will want to build from source. 
You will want to be technically confident in running on something a little 
outside the norm (SmartOS is based on Solaris).

Just make sure you test and benchmark all your options, a few days of testing 
now will save you weeks of pain.

Good luck!

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr 


On 05/08/2013, at 12:34 AM, David Schairer dschai...@humbaba.net wrote:

 Of course -- my point is simply that if you're looking for speed, SSD+KVM, 
 especially in a shared tenant situation, is unlikely to perform the way you 
 want to.  If you're building a pure proof of concept that never stresses the 
 system, it doesn't matter, but if you plan an MVP with any sort of scale, 
 you'll want a plan to be on something more robust.  
 
 I'll also say that it's really important (imho) to be doing even your dev in 
 a config where you have consistency conditions like eventual production -- so 
 make sure you're writing to both nodes and can have cases where eventual 
 consistency delays kick in, or it'll come back to bite you later -- I've seen 
 this force people to redesign their whole data model when they don't plan for 
 it initially.  
 
 As I said, I haven't tested DO.  I've tested very similar configurations at 
 other providers and they were all terrible under load -- and certainly took 
 away most of the benefits of SSD once you stressed writes a bit.  XEN+SSD, on 
 modern kernels, should work better, but I didn't test it (linode doesn't 
 offer this, though, and they've had lots of other challenges of late).  
 
 --DRS
 
 On Aug 3, 2013, at 11:40 PM, Ertio Lew ertio...@gmail.com wrote:
 
 @David:
 Like all other start-ups, we too cannot start with all dedicated servers for 
 Cassandra. So right now we have no better choice except for using a VPS :), 
 but we can definitely choose one from amongst a suitable set of VPS 
 configurations. As of now since we are starting out, could we initiate our 
 cluster with 2 nodes(RF=2), (KVM, 2GB ram, 2 cores, 30GB SDD) . Right now we 
 wont we having a very heavy load on Cassandra until a next few months till 
 we grow our user base. So, this choice is mainly based on the pricing vs 
 configuration as well as digital ocean's good reputation in the community.
 
 
 On Sun, Aug 4, 2013 at 12:53 AM, David Schairer dschai...@humbaba.net 
 wrote:
 I've run several lab configurations on linodes; I wouldn't run cassandra on 
 any shared virtual platform for large-scale production, just because your IO 
 performance is going to be really hard to predict.  Lots of people do, 
 though -- depends on your cassandra loads and how consistent you need to 
 have performance be, as well as how much of your working set will fit into 
 memory.  Remember that linode significantly oversells their CPU as well.
 
 The release version of KVM, at least as of a few months ago, still doesn't 
 support TRIM on SSD; that, plus the fact that you don't know how others will 
 use SSDs or if their file systems will keep the SSDs healthy, means that SSD 
 performance on KVM is going to be highly unpredictable.  I have not tested 
 digitalocean, but I did test several other KVM+SSD shared-tenant hosting 
 providers aggressively for cassandra a couple months ago; they all failed 
 badly.
 
 Your mileage will vary considerably based on what you need out of cassandra, 
 what your data patterns look like, and how you configure your system.  That 
 said, I would use xen before KVM for high-performance IO.
 
 I have not run Cassandra in any volume on Amazon -- lots of folks have, and 
 may have recommendations (including SSD) there for where it falls on the 
 price/performance curve.
 
 --DRS
 
 On Aug 3, 2013, at 11:33 AM, Ertio Lew ertio...@gmail.com wrote:
 
 I am building a cluster(initially starting with a 2-3 nodes cluster). I 
 have came across two seemingly good options for hosting, Linode  Digital 
 Ocean

Re: Recommendation for hosting multi tenant clusters

2013-08-13 Thread Ben Bromhead

 http://www.mail-archive.com/user@cassandra.apache.org/msg11022.html sums it up 
pretty well. Optimised images and provisioned IOPS may help, but whatever way 
you spin it your reads and writes are still going out on the network somewhere.

EBS is like a giant SAN which will drop out at any second, take almost 
everything in your region down with it whilst simultaneously opening up a gate 
to hell that lets all sorts of unimaginable horrors into the world. 

Ok maybe not that bad, but network issues between ebs and your instances is 
painful. Whereas network issues with a single AZ can be dealt with in the 
course of normal cluster operations.

On a slight tangent, have a read of 
http://thelastpickle.com/2011/06/13/Down-For-Me/ which does an awesome job of 
explaining what will happen to your quorum reads and writes when a AWS AZ goes 
down (and you use ephemeral storage).

Cheers

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359


On 14/08/2013, at 10:42 AM, Jon Haddad j...@jonhaddad.com wrote:

 I strongly recommend against EBS, even with optimized  ebs provisioned.  The 
 throughput you'll get from local drives is significantly better than what 
 you'll get with EBS (even 4K iops provisioned)
 
 On Aug 13, 2013, at 2:10 PM, Rahul Gupta rgu...@dekaresearch.com wrote:
 
 I am working on requirement to host multi tenant Cassandra cluster (or set 
 of clusters) on Amazon EC2 (AWS).
  
 With everything else sorted out, I have below question where I am looking 
 for recommendations:
  
 Does Amazon’s recent support of EBS optimized images changes whole 
 discussion around EBS vs. ephemeral drives and image size?
  
 · Option 1: reserved m1.xlarge (4x420GB drives) is $0.187/hr
 · Option 2: reserved m1.large EBS-optimized  is $0.119/hr 
 (~$50/month less than m1.xlarge, but $168/month for 4x420 standard EBS 
 volumes): costs $120/month more, but additional recovery options
  
 Given Cassandra is designed to survive failures, combining replication 
 factor 3 and backing-up to S3, I think should be enough for back up.
  
 Please advise.
  
 Thanks,
 Rahul Gupta
 DEKA Research  Development
 340 Commercial St  Manchester, NH  03101
 P: 603.666.3908 extn. 6504 | C: 603.718.9676
  
 This e-mail and the information, including any attachments, it contains are 
 intended to be a confidential communication only to the person or entity to 
 whom it is addressed and may contain information that is privileged. If the 
 reader of this message is not the intended recipient, you are hereby 
 notified that any dissemination, distribution or copying of this 
 communication is strictly prohibited. If you have received this 
 communication in error, please immediately notify the sender and destroy the 
 original message.
 
  
 
 This e-mail and the information, including any attachments, it contains are 
 intended to be a confidential communication only to the person or entity to 
 whom it is addressed and may contain information that is privileged. If the 
 reader of this message is not the intended recipient, you are hereby 
 notified that any dissemination, distribution or copying of this 
 communication is strictly prohibited. If you have received this 
 communication in error, please immediately notify the sender and destroy the 
 original message.
 
 Thank you.
 
 Please consider the environment before printing this email.

Re: PropertiesFileSnitch

2013-12-09 Thread Ben Bromhead

Look at GossipingPropertyFileSnitch 
(http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/architecture/architectureSnitchesAbout_c.html)
 and just use simple seed provider as described in the Datastax multi dc 
documentation.

That way for each new node you just need to define its dc / rack and it will 
use gossip to discover this information about other nodes. 

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 10 Dec 2013, at 5:31 am, Marcelo Elias Del Valle marc...@s1mbi0se.com.br 
wrote:

 Hello everyone,
 
 I have a cassandra cluster running at amazon. I am trying to add a new 
 datacenter for this cluster now, outside AWS. I know I could use multiregion, 
 but I would like to be vendor free in terms of cloud.
 Reading the article 
 http://www.datastax.com/docs/datastax_enterprise3.2/deploy/multi_dc_install, 
 it seems I will need to start using PropertiesFileSnitch instead of Ec2Snitch 
 to do what I want. So here it comes my question: 
 If I set all the seeds on my property file, what will happen if I need to 
 add mores machines and/or seeds to the cluster? Will I need to change the 
 property files on all the nodes of my cluster, or just on the new node?
 
 Best regards,
 Marcelo Valle.

Re: in AWS is it worth trying to talk to a server in the same zone as your client?

2014-02-12 Thread Ben Bromhead

0.01/G between zones irrespective of IP is correct.

As for your original question, depending on the driver you are using you could 
write a custom co-ordinator node selection policy.

For example if you are using the Datastax driver you would extend 
http://www.datastax.com/drivers/java/2.0/apidocs/com/datastax/driver/core/policies/LoadBalancingPolicy.html

… and set the distance based on which zone the node is in.

An alternate method would be to define the zones as data centres and then you 
could leverage existing DC aware policies (We've never tried this though). 


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359




On 13/02/2014, at 8:00 AM, Andrey Ilinykh ailin...@gmail.com wrote:

 I think you are mistaken. It is true for the same zone. between zones 0.01/G
 
 
 On Wed, Feb 12, 2014 at 12:17 PM, Russell Bradberry rbradbe...@gmail.com 
 wrote:
 Not when using private IP addresses.  That pricing ONLY applies if you are 
 using the public interface or EIP/ENI.  If you use the private IP addresses 
 there is no cost associated.
 
 
 
 On February 12, 2014 at 3:13:58 PM, William Oberman 
 (ober...@civicscience.com) wrote:
 
 Same region, cross zone transfer is $0.01 / GB (see 
 http://aws.amazon.com/ec2/pricing/, Data Transfer section).
 
 
 On Wed, Feb 12, 2014 at 3:04 PM, Russell Bradberry rbradbe...@gmail.com 
 wrote:
 Cross zone data transfer does not cost any extra money. 
 
 LOCAL_QUORUM = QUORUM if all 6 servers are located in the same logical 
 datacenter.  
 
 Ensure your clients are connecting to either the local IP or the AWS 
 hostname that is a CNAME to the local ip from within AWS.  If you connect to 
 the public IP you will get charged for outbound data transfer.
 
 
 
 On February 12, 2014 at 2:58:07 PM, Yogi Nerella (ynerella...@gmail.com) 
 wrote:
 
 Also, may be you need to check the read consistency to local_quorum, 
 otherwise the servers still try to read the data from all other data 
 centers.
 
 I can understand the latency, but I cant understand how it would save 
 money?   The amount of data transferred from the AWS server to the client 
 should be same no matter where the client is connected?

 
 
 On Wed, Feb 12, 2014 at 10:33 AM, Andrey Ilinykh ailin...@gmail.com wrote:
 yes, sure. Taking data from the same zone will reduce latency and save you 
 some money.
 
 
 On Wed, Feb 12, 2014 at 10:13 AM, Brian Tarbox tar...@cabotresearch.com 
 wrote:
 We're running a C* cluster with 6 servers spread across the four us-east1 
 zones.
 
 We also spread our clients (hundreds of them) across the four zones.
 
 Currently we give our clients a connection string listing all six servers 
 and let C* do its thing.
 
 This is all working just fine...and we're paying a fair bit in AWS transfer 
 costs.  There is a suspicion that this transfer cost is driven by us 
 passing data around between our C* servers and clients.
 
 Would there be any value to trying to get a client to talk to one of the C* 
 servers in its own zone?
 
 I understand (at least partially!) about coordinator nodes and replication 
 and know that no matter which server is the coordinator for an operation 
 replication may cause bits to get transferred to/from servers in other 
 zones.  Having said that...is there a chance that trying to encourage a 
 client to initially contact a server in its own zone would help?
 
 Thank you,
 
 Brian Tarbox

Re: Recommended OS

2014-02-12 Thread Ben Bromhead

We are currently trialling SmartOS with Cassandra and have seen some pretty 
good results (and the mmap stuff appears to work). As Rob said, if this is 
production cluster, run with linux… there will be far less pain.

If you are super keen on running on something different from linux in 
production (after all the warnings), run most of your cluster on linux, then 
run a single node or a separate DC with SmartOS, Solaris, BeOS, OS/2, Minix, 
Windows 3.1 or whatever it is that you choose and let us know how it all goes!

Cheers 

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 13/02/2014, at 6:32 AM, Jeffrey Kesselman jef...@gmail.com wrote:

 Its quite possible its well tricked out for Linux.
 
 My major issue with Linux has been that its TCP/IP stack is nowhere near as 
 scalable as Solaris' for massive numbers of simultaneous connections.  But 
 thats probably less of an issue with a Cassandra node then it has been with 
 the game servers I've built.
 
 
 On Wed, Feb 12, 2014 at 1:52 PM, Robert Coli rc...@eventbrite.com wrote:
 On Wed, Feb 12, 2014 at 8:55 AM, Jeffrey Kesselman jef...@gmail.com wrote:
 I haven't run Cassandra in production myself, but for other high load Java 
 based servers I've had really good scaling success with OpenSolaris.  In 
 particular I've used Joyent's SmartOS which has the additional advantage of 
 bursting to cover brief periods of exceptional load.
 
 There are a significant number of Linux only optimizations in Cassandra. Very 
 few people operate production clusters on anything but Linux.
 
 The most obvious optimization that comes to mind is the use of direct i/o to 
 avoid blowing out the page cache under various circumstances.
 
 My approach towards running Cassandra on anything but Linux would be to try 
 to directly compare performance to the same hardware running Linux.
 
 =Rob
  
 
 
 
 -- 
 It's always darkest just before you are eaten by a grue.

Re: Load balancing issue with virtual nodes

2014-04-28 Thread Ben Bromhead

Some imbalance is expected and considered normal:

See http://wiki.apache.org/cassandra/VirtualNodes/Balance

As well as

https://issues.apache.org/jira/browse/CASSANDRA-7032

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 29 Apr 2014, at 7:30 am, DuyHai Doan doanduy...@gmail.com wrote:

 Hello all
 
  Some update about the issue.
 
  After wiping completely all sstable/commitlog/saved_caches folder and 
 restart the cluster from scratch, we still experience weird figures. After 
 the restart, nodetool status does not show an exact balance of 50% of data 
 for each node :
 
 
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 -- Address  Load Tokens Owns (effective) Host ID Rack
 UN host1 48.57 KB 256 51.6%  d00de0d1-836f-4658-af64-3a12c00f47d6 rack1
 UN host2 48.57 KB 256 48.4%  e9d2505b-7ba7-414c-8b17-af3bbe79ed9c rack1
 
 
 As you can see, the % is very close to 50% but not exactly 50%
 
  What can explain that ? Can it be network connection issue during token 
 initial shuffle phase ?
 
 P.S: both host1 and host2 are supposed to have exactly the same hardware
 
 Regards
 
  Duy Hai DOAN
 
 
 On Thu, Apr 24, 2014 at 11:20 PM, Batranut Bogdan batra...@yahoo.com wrote:
 I don't know about hector but the datastax java driver needs just one ip from 
 the cluster and it will discover the rest of the nodes. Then by default it 
 will do a round robin when sending requests. So if Hector does the same the 
 patterb will againg appear.
 Did you look at the size of the dirs?
 That documentation is for C* 0.8. It's old. But depending on your boxes you 
 might reach CPU bottleneck. Might want to google for write path in 
 cassandra..  According to that, there is not much to do when writes come 
 in...  
 On Friday, April 25, 2014 12:00 AM, DuyHai Doan doanduy...@gmail.com wrote:
 I did some experiments.
 
  Let's say we have node1 and node2
 
 First, I configured Hector with node1  node2 as hosts and I saw that only 
 node1 has high CPU load
 
 To eliminate the client connection issue, I re-test with only node2 
 provided as host for Hector. Same pattern. CPU load is above 50% on node1 and 
 below 10% on node2.
 
 It means that node2 is playing as coordinator and forward many write/read 
 request to node1
 
  Why did I look at CPU load and not iostat  al ?
 
  Because I have a very intensive write work load with read-only-once pattern. 
 I've read here 
 (http://www.datastax.com/docs/0.8/cluster_architecture/cluster_planning) that 
 heavy write in C* is more CPU bound but maybe the info may be outdated and no 
 longer true
 
  Regards
 
  Duy Hai DOAN
 
 
 On Thu, Apr 24, 2014 at 10:00 PM, Michael Shuler mich...@pbandjelly.org 
 wrote:
 On 04/24/2014 10:29 AM, DuyHai Doan wrote:
   Client used = Hector 1.1-4
   Default Load Balancing connection policy
   Both nodes addresses are provided to Hector so according to its
 connection policy, the client should switch alternatively between both nodes
 
 OK, so is only one connection being established to one node for one bulk 
 write operation? Or are multiple connections being made to both nodes and 
 writes performed on both?
 
 -- 
 Michael

Re: Connect Cassandra rings in datacenter and ec2

2014-04-29 Thread Ben Bromhead

You will need to have the nodes running on AWS in a VPC. 

You can then configure a VPN to work with your VPC, see 
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html. Also as you 
will have multiple VPN connections (from your private DC and the other AWS 
region) AWS CloudHub will be the way to go 
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPN_CloudHub.html.

Additionally to access your Cassandra instances from your other VPCs you can 
use VPC peering (within the same region). See 
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-peering.html 

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 30 Apr 2014, at 11:38 am, Chris Lohfink clohf...@blackbirdit.com wrote:

 Cassandra will require a different address per node though or at least 1 
 unique internal for same DC and 1 unique external for other DCs.  You could 
 look into http://aws.amazon.com/vpc/ or some other vpn solution.
 
 ---
 Chris Lohfink
 
 On Apr 29, 2014, at 6:56 PM, Trung Tran tr...@brightcloud.com wrote:
 
 Hi,
 
 We're planning to deploy 3 cassandra rings, one in our datacenter (with more 
 node/power) and two others in EC2. We don't have enough public IP to assign 
 for each individual node in our data center, so i wonder how could we 
 connect the cluster together? 
 
 Have any one tried this before, and if this is a good way to deploy 
 cassandra?
 
 Thanks,
 Trung.

Re: Can Cassandra client programs use hostnames instead of IPs?

2014-05-13 Thread Ben Bromhead

You can set listen_address in cassandra.yaml to a hostname 
(http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html).
 

Cassandra will use the IP address returned by a DNS query for that hostname. On 
AWS you don't have to assign an elastic IP, all instances will come with a 
public IP that lasts its lifetime (if you use ec2-classic or your VPC is set up 
to assign them).

Note that whatever hostname you set in a nodes listen_address, it will need to 
return the private IP as AWS instances only have network access via there 
private address. Traffic to a instances public IP is NATed and forwarded to the 
private address. So you may as well just use the nodes IP address.

If you run hadoop on instances in the same AWS region it will be able to access 
your Cassandra cluster via private IP. If you run hadoop externally just use 
the public IPs. 

If you run in a VPC without public addressing and want to connect from external 
hosts you will want to look at a VPN 
(http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html).

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359




On 13/05/2014, at 4:31 AM, Huiliang Zhang zhl...@gmail.com wrote:

 Hi,
 
 Cassandra returns ips of the nodes in the cassandra cluster for further 
 communication between hadoop program and the casandra cluster. Is there a way 
 to configure the cassandra cluster to return hostnames instead of ips? My 
 cassandra cluster is on AWS and has no elastic ips which can be accessed 
 outside AWS.
 
 Thanks,
 Huiliang

Re: Storing log structured data in Cassandra without compactions for performance boost.

2014-05-16 Thread Ben Bromhead

If you make the timestamp the partition key you won't be able to do range 
queries (unless you use an ordered partitioner).

Assuming you are logging from multiple devices you will want your partition key 
to be the device id  the date, your clustering key to be the timestamp 
(timeuuid are good to prevent collisions) and then log message, levels etc as 
the other columns.

Then you can also create a new table for every week (or day/month depending on 
how much granularity you want) and just write to the current weeks table. This 
step allows you to delete old data without Cassandra using tombstones (you just 
drop the table for the week of logs you want to delete).

For a much clearer explantation see 
http://www.slideshare.net/patrickmcfadin/cassandra-20-and-timeseries (the last 
few slides).

As for compaction, I would leave it enabled as having lots of stables hanging 
around can make range queries slower (the query has more files to visit). See 
http://stackoverflow.com/questions/8917882/cassandra-sstables-and-compaction (a 
little old but still relevant). Compaction also fixes up things like merging 
row fragments (when you write new columns to the same row).


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359


On 07/05/2014, at 10:55 AM, Kevin Burton bur...@spinn3r.com wrote:

 I'm looking at storing log data in Cassandra… 
 
 Every record is a unique timestamp for the key, and then the log line for the 
 value.
 
 I think it would be best to just disable compactions.
 
 - there will never be any deletes.
 
 - all the data will be accessed in time range (probably partitioned randomly) 
 and sequentially.
 
 So every time a memtable flushes, we will just keep that SSTable forever.  
 
 Compacting the data is kind of redundant in this situation.
 
 I was thinking the best strategy is to use setcompactionthreshold and set the 
 value VERY high to compactions are never triggered.
 
 Also, It would be IDEAL to be able to tell cassandra to just drop a full 
 SSTable so that I can truncate older data without having to do a major 
 compaction and without having to mark everything with a tombstone.  Is this 
 possible?
 
 
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.

Re: Ec2 Network I/O

2014-05-20 Thread Ben Bromhead

Also once you've got your phi_convict_threshold sorted, if you see these again 
check:

http://status.aws.amazon.com/ 

AWS does occasionally have the odd increased latency issue / outage. 

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359


On 19/05/2014, at 1:15 PM, Nate McCall n...@thelastpickle.com wrote:

 It's a good idea to increase phi_convict_threshold to at least 12 on EC2. 
 Using placement groups and single-tenant systems will certainly help.
 
 Another optimization would be dedicating an Enhanced Network Interface 
 (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html) 
 specifically for gossip traffic. 
 
 
 On Mon, May 19, 2014 at 1:36 PM, Phil Burress philburress...@gmail.com 
 wrote:
 Has anyone experienced network i/o issues with ec2? We are seeing a lot of 
 these in our logs:
 
 HintedHandOffManager.java (line 477) Timed out replaying hints to 
 /10.0.x.xxx; aborting (15 delivered)
 
 and these...
 
 Cannot handshake version with /10.0.x.xxx
 
 and these...
 
 java.io.IOException: Cannot proceed on repair because a neighbor 
 (/10.0.x.xxx) is dead: session failed
 
 Occurs on all of our nodes. Even though in all cases, the host that is being 
 reported as down or unavailable is up and readily 'pingable'.
 
 We are using shared tenancy on all our nodes (instance type m1.xlarge) with 
 cassandra 2.0.7. Any suggestions on how to debug these errors?
 
 Is there a recommendation to move to Placement Groups for Cassandra?
 
 Thanks!
 
 Phil 
 
 
 
 -- 
 -
 Nate McCall
 Austin, TX
 @zznate
 
 Co-Founder  Sr. Technical Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

Re: autoscaling cassandra cluster

2014-05-21 Thread Ben Bromhead

The mechanics for it are simple compared to figuring out when to scale, 
especially when you want to be scaling before peak load on your cluster (adding 
and removing nodes puts additional load on your cluster).

We are currently building our own in-house solution for this for our customers. 
If you want to have a go at it yourself, this is a good starting point:

http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html
http://techblog.netflix.com/2013/12/scryer-netflixs-predictive-auto-scaling.html

Most of this is fairly specific to Netflix, but an interesting read nonetheless.

Datastax OpsCenter also provides capacity planning and forecasting and can 
provide an easy set of metrics you can make your scaling decisions on.

http://www.datastax.com/what-we-offer/products-services/datastax-opscenter 

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359




On 21/05/2014, at 7:51 AM, James Horey j...@opencore.io wrote:

 If you're interested and/or need some Cassandra docker images let me know 
 I'll shoot you a link.
 
 James
 
 Sent from my iPhone
 
 On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote:
 
 That sounds interesting.   I was thinking of using coreos with docker 
 containers for the business logic, frontend and Cassandra. I'll also have a 
 look at cassandra-mesos
 
 Thanks
 
 Jabbar Azam
 
 On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote:
 I agree with Prem, but recently a guy send this promising project called 
 Mesos in this list. 
 https://github.com/mesosphere/cassandra-mesos
 One of its goals is to make scaling easier. 
 I don’t have any personal opinion yet but maybe you could give it a try.
 
 Regards,
 Panagiotis
 
 
 
 On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote:
 Hello Prem,
 
 I'm trying to find out whether people are autoscaling up and down 
 automatically, not manually. I'm also interested in whether they are using a 
 cloud based solution and creating and destroying instances. 
 
 I've found the following regarding GCE 
 https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platform
  and how instances can be created and destroyed. 
 
  I
 
 
 Thanks
 
 Jabbar Azam
 
 
 On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote:
 Hi Jabbar,
 with vnodes, scaling up should not be a problem. You could just add a 
 machines with the cluster/seed/datacenter conf and it should join the 
 cluster.
 Scaling down has to be manual where you drain the node and decommission it.
 
 thanks,
 Prem
 
 
 
 On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:
 Hello,
 
 Has anybody got a cassandra cluster which autoscales depending on load or 
 times of the day?
 
 I've seen the documentation on the datastax website and that only mentioned 
 adding and removing nodes, unless I've missed something.
 
 I want to know how to do this for the google compute engine. This isn't for 
 a production system but a test system(multiple nodes) where I want to learn. 
 I'm not sure how to check the performance of the cluster, whether I use one 
 performance metric or a mix of performance metrics and then invoke a script 
 to add or remove nodes from the cluster.
 
 I'd be interested to know whether people out there are autoscaling cassandra 
 on demand.
 
 Thanks
 
 Jabbar Azam

Re: Multi-DC Environment Question

2014-05-29 Thread Ben Bromhead

Short answer:

If time elapsed  max_hint_window_in_ms then hints will stop being created. You 
will need to rely on your read consistency level, read repair and anti-entropy 
repair operations to restore consistency.

Long answer:

http://www.slideshare.net/jasedbrown/understanding-antientropy-in-cassandra

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 30 May 2014, at 8:40 am, Tupshin Harper tups...@tupshin.com wrote:

 When one node or DC is down, coordinator nodes being written through will 
 notice this fact and store hints (hinted handoff is the mechanism),  and 
 those hints are used to send the data that was not able to be replicated 
 initially.
 
 http://www.datastax.com/dev/blog/modern-hinted-handoff
 
 -Tupshin
 
 On May 29, 2014 6:22 PM, Vasileios Vlachos vasileiosvlac...@gmail.com 
 wrote:
 Hello All,
 
 We have plans to add a second DC to our live Cassandra environment. Currently 
 RF=3 and we read and write at QUORUM. After adding DC2 we are going to be 
 reading and writing at LOCAL_QUORUM.
 
 If my understanding is correct, when a client sends a write request, if the 
 consistency level is satisfied on DC1 (that is RF/2+1), success is returned 
 to the client and DC2 will eventually get the data as well. The assumption 
 behind this is that the the client always connects to DC1 for reads and 
 writes and given that there is a site-to-site VPN between DC1 and DC2. 
 Therefore, DC1 will almost always return success before DC2 (actually I don't 
 know if it is possible for DC2 to be more up-to-date than DC1 with this 
 setup...).
 
 Now imagine DC1 looses connectivity and the client fails over to DC2. 
 Everything should work fine after that, with the only difference that DC2 
 will be now handling the requests directly from the client. After some time, 
 say after max_hint_window_in_ms, DC1 comes back up. My question is how do I 
 bring DC1 up to speed with DC2 which is now more up-to-date? Will that 
 require a nodetool repair on DC1 nodes? Also, what is the answer when the 
 outage is  max_hint_window_in_ms instead?
 
 Thanks in advance!
 
 Vasilis
 -- 
 Kind Regards,
 
 Vasileios Vlachos

Re: Managing truststores with inter-node encryption

2014-05-30 Thread Ben Bromhead

Java ssl sockets need to be able to build a chain of trust. So having
either a nodes public cert or the root cert in the truststore works (as you
found out).

To get cassandra to use cypher suites  128 bit you will need to install
the JCE unlimited strength jurisdiction policy files. You will know if you
aren't using it because there will be a bunch of warnings quickly filling
up your logs.

Note that javas ssl implementation does not check certificate revocation
lists by default, though as you are not using inter node for authentication
and identification its no big deal.

Ben
 On 31/05/2014 1:04 AM, Jeremy Jongsma jer...@barchart.com wrote:

 It appears that only adding the CA certificate to the truststore is
 sufficient for this.


 On Thu, May 22, 2014 at 10:05 AM, Jeremy Jongsma jer...@barchart.com
 wrote:

 The docs say that each node needs every other node's certificate in its
 local truststore:


 http://www.datastax.com/documentation/cassandra/1.2/cassandra/security/secureSSLCertificates_t.html

 This seems like a bit of a headache for adding nodes to a cluster. How do
 others deal with this?

 1) If I am self-signing the client certificates (with puppetmaster), is
 it enough that the truststore just contain the CA certificate used to sign
 them? This is the typical PKI mechanism for verifying trust, so I am hoping
 it works here.

 2) If not, can I use the same certificate for every node? If so, what is
 the downside? I'm mainly concerned with encryption over public internet
 links, not node identity verification.

Re: VPC AWS

2014-06-10 Thread Ben Bromhead

Have a look at http://www.tinc-vpn.org/, mesh based and handles multiple 
gateways for the same network in a graceful manner (so you can run two gateways 
per region for HA).

Also supports NAT traversal if you need to do public-private clusters. 

We are currently evaluating it for our managed Cassandra in a VPC solution, but 
we haven’t ever used it in a production environment or with a heavy load, so 
caveat emptor. 

As for the snitch… the GPFS is definitely the most flexible. 

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 10 Jun 2014, at 1:42 am, Ackerman, Mitchell mitchell.acker...@pgi.com 
wrote:

 Peter,
  
 I too am working on setting up a multi-region VPC Cassandra cluster.  Each 
 region is connected to each other via an OpenVPN tunnel, so we can use 
 internal IP addresses for both the seeds and broadcast address.   This allows 
 us to use the EC2Snitch (my interpretation of the caveat that this snitch 
 won’t work in a multi-region environment is that it won’t work if you can’t 
 use internal IP addresses, which we can via the VPN tunnels).  All the C* 
 nodes find each other, and nodetool (or OpsCenter) shows that we have 
 established a multi-datacenter cluster. 
  
 Thus far, I’m not happy with the performance of the cluster in such a 
 configuration, but I don’t think that it is related to this configuration, 
 though it could be.
  
 Mitchell
  
 From: Peter Sanford [mailto:psanf...@retailnext.net] 
 Sent: Monday, June 09, 2014 7:19 AM
 To: user@cassandra.apache.org
 Subject: Re: VPC AWS
  
 Your general assessments of the limitations of the Ec2 snitches seem to match 
 what we've found. We're currently using the GossipingPropertyFileSnitch in 
 our VPCs. This is also the snitch to use if you ever want to have a DC in EC2 
 and a DC with another hosting provider. 
  
 -Peter
  
 
 On Mon, Jun 9, 2014 at 5:48 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 Hi guys, there is a lot of answer, it looks like this subject is interesting 
 a lot of people, so I will end up letting you know how it went for us.
  
 For now, we are still doing some tests.
  
 Yet I would like to know how we are supposed to configure Cassandra in this 
 environment :
  
 - VPC 
 - Multiple datacenters (should be VPCs, one per region, linked through VPN ?)
 - Cassandra 1.2
  
 We are currently running under EC2MultiRegionSnitch, but with no VPC. Our VPC 
 will have no public interface, so I am not sure how to configure broadcast 
 address or seeds that are supposed to be the public IP of the node.
  
 I could use EC2Snitch, but will cross region work properly ?
  
 Should I use an other snitch ?
  
 Is someone using a similar configuration ?
  
 Thanks for information already given guys, we will achieve this ;-).
  
 
 2014-06-07 0:05 GMT+02:00 Jonathan Haddad j...@jonhaddad.com:
  
 This may not help you with the migration, but it may with maintenance  
 management.  I just put up a blog post on managing VPC security groups with a 
 tool I open sourced at my previous company.  If you're going to have 
 different VPCs (staging / prod), it might help with managing security groups.
  
 http://rustyrazorblade.com/2014/06/an-introduction-to-roadhouse/
  
 Semi shameless plug... but relevant.
  
 
 On Thu, Jun 5, 2014 at 12:01 PM, Aiman Parvaiz ai...@shift.com wrote:
 Cool, thanks again for this.
  
 
 On Thu, Jun 5, 2014 at 11:51 AM, Michael Theroux mthero...@yahoo.com wrote:
 You can have a ring spread across EC2 and the public subnet of a VPC.  That 
 is how we did our migration.  In our case, we simply replaced the existing 
 EC2 node with a new instance in the public VPC, restored from a backup taken 
 right before the switch.
  
 -Mike
  
 From: Aiman Parvaiz ai...@shift.com
 To: Michael Theroux mthero...@yahoo.com 
 Cc: user@cassandra.apache.org user@cassandra.apache.org 
 Sent: Thursday, June 5, 2014 2:39 PM
 Subject: Re: VPC AWS
  
 Thanks for this info Michael. As far as restoring node in public VPC is 
 concerned I was thinking ( and I might be wrong here) if we can have a ring 
 spread across EC2 and public subnet of a VPC, this way I can simply 
 decommission nodes in Ec2 as I gradually introduce new nodes in public subnet 
 of VPC and I will end up with a ring in public subnet and then migrate them 
 from public to private in a similar way may be.
  
 If anyone has any experience/ suggestions with this please share, would 
 really appreciate it.
  
 Aiman
  
 
 On Thu, Jun 5, 2014 at 10:37 AM, Michael Theroux mthero...@yahoo.com wrote:
 The implementation of moving from EC2 to a VPC was a bit of a juggling act.  
 Our motivation was two fold:
  
 1) We were running out of static IP addresses, and it was becoming 
 increasingly difficult in EC2 to design around limiting the number of static 
 IP addresses to the number of public IP addresses EC2 allowed
 2) VPC affords us an additional level of security that was desirable.
  
 However, we needed to consider the following

Re: Minimum Cluster size to accommodate a single node failure

2014-06-18 Thread Ben Bromhead

Yes your thinking is correct.

This article from TLP sums it all up beautifully 
http://thelastpickle.com/blog/2011/06/13/Down-For-Me.html 

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 18 Jun 2014, at 4:18 pm, Prabath Abeysekara prabathabeysek...@gmail.com 
wrote:

 Sorry, the title of this thread has to be Minimum cluster size to survive a 
 single node failure.
 
 
 On Wed, Jun 18, 2014 at 11:38 AM, Prabath Abeysekara 
 prabathabeysek...@gmail.com wrote:
 Hi Everyone,
 
 First of all, apologies if the $subject was discussed previously in this list 
 before. I've already gone through quite a few email trails on this but still 
 couldn't find a convincing answer which really made me raise this question 
 again here in this list.
 
 If my understanding is correct, a 3 node Cassandra cluster would survive a 
 single node failure while the Replication Factor is set to 3 with consistency 
 levels are assigned QUORUM for read/write operations. For example, let's 
 consider the following configuration.
 
 * Number of nodes in the cluster : 3
 * Replication Factor : 3
 * Read/Write consistencies : QUORUM (this evaluates to 2 when RF is set to 3)
 
 Here's how I expect it to work.
 
 Whenever a read operation takes place, the Cassandra cluster coordinator node 
 that receives the read request would try to read from at least two replicas 
 before responding to the client. With Read consistency being 2 (+ all raws 
 being available in all three nodes), we should be able to survive a single 
 node failure in this particular instance for read operations. Similarly, for 
 write requests, even in the middle of a single node failure, the writes 
 should be allowed as the Write consistency is set to 2? 
 
 Can someone please confirm whether what's mentioned above is correct? 
 (Please note that I'm trying to figure out the minimum node numbers and I 
 indeed am aware of the fact that there are other factors also to be 
 considered in order to come up with the most optimal numbers for a given 
 cluster requirement).
 
 
 Cheers,
 Prabath
 -- 
 Prabath
 
 
 
 -- 
 Prabath

Re: EBS SSD - Cassandra ?

2014-06-18 Thread Ben Bromhead

http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningEC2_c.html

From the link:

EBS volumes are not recommended for Cassandra data volumes for the following
reasons:

• EBS volumes contend directly for network throughput with standard
packets. This means that EBS throughput is likely to fail if you saturate a
network link.
• EBS volumes have unreliable performance. I/O performance can be
exceptionally slow, causing the system to back load reads and writes until the
entire cluster becomes unresponsive.
• Adding capacity by increasing the number of EBS volumes per host does
not scale. You can easily surpass the ability of the system to keep effective
buffer caches and concurrently serve requests for all of the data it is
responsible for managing.

Still applies, especially the network contention and latency issues.

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 18 Jun 2014, at 7:18 pm, Daniel Chia danc...@coursera.org wrote:

While they guarantee IOPS, they don't really make any guarantees about
latency. Since EBS goes over the network, there's so many things in the path
of getting at your data, I would be concerned with random latency spikes,
unless proven otherwise.

Thanks,
Daniel

On Wed, Jun 18, 2014 at 1:58 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
In this document it is said :

Provisioned IOPS (SSD) - Volumes of this type are ideal for the most
demanding I/O intensive, transactional workloads and large relational or
NoSQL databases. This volume type provides the most consistent performance
and allows you to provision the exact level of performance you need with the
most predictable and consistent performance. With this type of volume you
provision exactly what you need, and pay for what you provision. Once again,
you can achieve up to 48,000 IOPS by connecting multiple volumes together
using RAID.

2014-06-18 10:57 GMT+02:00 Alain RODRIGUEZ arodr...@gmail.com:

Hi,

I just saw this :
http://aws.amazon.com/fr/blogs/aws/new-ssd-backed-elastic-block-storage/

Since the problem with EBS was the network, there is no chance that this
hardware architecture might be useful alongside Cassandra, right ?

Alain

Re: EBS SSD - Cassandra ?

2014-06-19 Thread Ben Bromhead

Irrespective of performance and latency numbers there are fundamental flaws
with using EBS/NAS and Cassandra, particularly around bandwidth contention and
what happens when the shared storage medium breaks. Also obligatory reference
to http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html.

Regarding ENI

AWS are pretty explicit about it’s impact on bandwidth:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html
Attaching another network interface to an instance is not a method to increase
or double the network bandwidth to or from the dual-homed instance.

So Nate you are right in that it is a function of logical separation helps for
some reason.

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 20 Jun 2014, at 8:17 am, Nate McCall n...@thelastpickle.com wrote:

Sorry - should have been clear I was speaking in terms of route optimizing,
not bandwidth. No idea as to the implementation (probably instance specific)
and I doubt it actually doubles bandwidth.

Specifically: having an ENI dedicated to API traffic did smooth out some
recent load tests we did for a client. It could be that overall throughput
increases where more a function of cleaner traffic segmentation/smoother
routing. We werent being terribly scientific - was more an artifact of
testing network segmentation.

I'm just going to say that using an ENI will make things better (since
traffic segmentation is always good practice anyway :) YMMV.

On Thu, Jun 19, 2014 at 3:39 PM, Russell Bradberry rbradbe...@gmail.com
wrote:
does an elastic network interface really use a different physical network
interface? or is it just to give the ability for multiple ip addresses?

On June 19, 2014 at 3:56:34 PM, Nate McCall (n...@thelastpickle.com) wrote:

If someone really wanted to try this it, I recommend adding an Elastic
Network Interface or two for gossip and client/API traffic. This lets EBS
and management traffic have the pre-configured network.

On Thu, Jun 19, 2014 at 6:54 AM, Benedict Elliott Smith
belliottsm...@datastax.com wrote:
I would say this is worth benchmarking before jumping to conclusions. The
network being a bottleneck (or latency causing) for EBS is, to my knowledge,
supposition, and instances can be started with direct connections to EBS if
this is a concern. The blog post below shows that even without SSDs the
EBS-optimised provisioned-IOPS instances show pretty consistent latency
numbers, although those latencies are higher than you would typically expect
from locally attached storage.

http://blog.parse.com/2012/09/17/parse-databases-upgraded-to-amazon-provisioned-iops/

Note, I'm not endorsing the use of EBS. Cassandra is designed to scale up
with number of nodes, not with depth of nodes (as Ben mentions, saturating a
single node's data capacity is pretty easy these days. CPUs rapidly become
the bottleneck as you try to go deep). However the argument that EBS cannot
provide consistent performance seems overly pessimistic, and should probably
be empirically determined for your use case.

On Thu, Jun 19, 2014 at 9:50 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
Ok, looks fair enough.

Thanks guys. I would be great to be able to add disks when amount of data
raises and add nodes when throughput increases... :)

2014-06-19 5:27 GMT+02:00 Ben Bromhead b...@instaclustr.com:

http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningEC2_c.html

From the link:

EBS volumes are not recommended for Cassandra data volumes for the following
reasons:

• EBS volumes contend directly for network throughput with standard packets.
This means that EBS throughput is likely to fail if you saturate a network
link.
• EBS volumes have unreliable performance. I/O performance can be
exceptionally slow, causing the system to back load reads and writes until
the entire cluster becomes unresponsive.
• Adding capacity by increasing the number of EBS volumes per host does not
scale. You can easily surpass the ability of the system to keep effective
buffer caches and concurrently serve requests for all of the data it is
responsible for managing.

Still applies, especially the network contention and latency issues.

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 18 Jun 2014, at 7:18 pm, Daniel Chia danc...@coursera.org wrote:

While they guarantee IOPS, they don't really make any guarantees about
latency. Since EBS goes over the network, there's so many things in the
path of getting at your data, I would be concerned with random latency
spikes, unless proven otherwise.

Thanks,
Daniel

On Wed, Jun 18, 2014 at 1:58 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
In this document it is said :

Provisioned IOPS (SSD) - Volumes of this type are ideal for the most
demanding I/O

Re: possible to have TTL on individual collection values?

2014-07-17 Thread Ben Bromhead

Create a table with a set as one of the columns using cqlsh, populate with a 
few records.

Connect using the cassandra-cli, run list on your table/cf and you'll see how 
the sets work. 


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359


On 13/07/2014, at 11:19 AM, Kevin Burton bur...@spinn3r.com wrote:

 
 
 
 On Sat, Jul 12, 2014 at 6:05 PM, Keith Wright kwri...@nanigans.com wrote:
 Yes each item in the set can have a different TTL so long as they are 
 upserted with commands having differing TTLs. 
 
 
 Ah… ok. So you can just insert them with unique UPDATE/INSERT commands with 
 different USING TTLs and it will work.  That makes sense.
 You should read about how collections/maps work in CQL3 in terms of their 
 CQL2 structure.
 
 Definitely.  I tried but the documentation is all over the map.  This is one 
 of the problems with Cassandra IMO.  It's evolving so fast that it's 
 difficult to find the correct documentation.
  
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile

Re: any plans for coprocessors?

2014-07-27 Thread Ben Bromhead

http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/trigger_r.html


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 26 Jul 2014, at 11:32 am, Kevin Burton bur...@spinn3r.com wrote:

 Are there any plans to add coprocessors to cassandra?
 
 Embedding logic directly in a cassandra daemon would be nice.
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile

Re: stalled nodetool repair?

2014-08-21 Thread Ben Bromhead

https://github.com/mstump/cassandra_range_repair

Also very useful. 

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359




On 22/08/2014, at 6:12 AM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Aug 21, 2014 at 12:32 PM, Kevin Burton bur...@spinn3r.com wrote:
 How do I watch the progress of nodetool repair.
 
 This is a very longstanding operational problem in Cassandra. Repair barely 
 works and is opaque, yet one is expected to run it once a week in the default 
 configuration.
 
 An unreasonably-hostile-in-tone-but-otherwise-accurate description of the 
 status quo before the re-write of streaming in 2.0 :
 
 https://issues.apache.org/jira/browse/CASSANDRA-5396
 
 A proposal to change the default for gc_grace_seconds to 34 days, so that 
 this fragile and heavyweight operation only has to be done once a month :
 
 https://issues.apache.org/jira/browse/CASSANDRA-5850
  
 granted , this is a lot of data, but it would be nice to at least see some 
 progress.
 
 Here's the rewrite of streaming, where progress indication improves 
 dramatically over the prior status quo :
 
 https://issues.apache.org/jira/browse/CASSANDRA-5286
 
 And here's two open tickets on making repair less opaque (thx 
 yukim@#cassandra) :
 
 https://issues.apache.org/jira/browse/CASSANDRA-5483
 https://issues.apache.org/jira/browse/CASSANDRA-5839
 
 =Rob

Re: stalled nodetool repair?

2014-08-21 Thread Ben Bromhead

Ah sorry that is the original repo, see 
https://github.com/BrianGallew/cassandra_range_repair for the updated version 
of the script with vnode support 

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 22 Aug 2014, at 2:19 pm, DuyHai Doan doanduy...@gmail.com wrote:

 Thanks Ben for the link. Still this script does not work with vnodes, which 
 exclude a wide range of C* config
 
 
 On Thu, Aug 21, 2014 at 5:51 PM, Ben Bromhead b...@instaclustr.com wrote:
 https://github.com/mstump/cassandra_range_repair
 
 Also very useful. 
 
 Ben Bromhead
 Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359
 
 
 
 
 On 22/08/2014, at 6:12 AM, Robert Coli rc...@eventbrite.com wrote:
 
 On Thu, Aug 21, 2014 at 12:32 PM, Kevin Burton bur...@spinn3r.com wrote:
 How do I watch the progress of nodetool repair.
 
 This is a very longstanding operational problem in Cassandra. Repair barely 
 works and is opaque, yet one is expected to run it once a week in the 
 default configuration.
 
 An unreasonably-hostile-in-tone-but-otherwise-accurate description of the 
 status quo before the re-write of streaming in 2.0 :
 
 https://issues.apache.org/jira/browse/CASSANDRA-5396
 
 A proposal to change the default for gc_grace_seconds to 34 days, so that 
 this fragile and heavyweight operation only has to be done once a month :
 
 https://issues.apache.org/jira/browse/CASSANDRA-5850
  
 granted , this is a lot of data, but it would be nice to at least see some 
 progress.
 
 Here's the rewrite of streaming, where progress indication improves 
 dramatically over the prior status quo :
 
 https://issues.apache.org/jira/browse/CASSANDRA-5286
 
 And here's two open tickets on making repair less opaque (thx 
 yukim@#cassandra) :
 
 https://issues.apache.org/jira/browse/CASSANDRA-5483
 https://issues.apache.org/jira/browse/CASSANDRA-5839
 
 =Rob

Re: Can't Add AWS Node due to /mnt/cassandra/data directory

2014-08-27 Thread Ben Bromhead

Make sure you have also setup the ephemeral drives as a raid device (use mdadm) 
and mounted it under /mnt/cassandra otherwise your data dir is the os partition 
which is usually very small.

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 27 Aug 2014, at 8:21 pm, Stephen Portanova sport...@gmail.com wrote:

 Worked great! Thanks Mark!
 
 
 On Wed, Aug 27, 2014 at 2:00 AM, Mark Reddy mark.l.re...@gmail.com wrote:
 Hi stephen,
 
 I have never added a node via OpsCenter, so this may be a short coming of 
 that process. However in non OpsCenter installs you would have to create the 
 data directories first:
 
 sudo mkdir -p /mnt/cassandra/commitlog
 sudo mkdir -p /mnt/cassandra/data
 sudo mkdir -p /mnt/cassandra/saved_caches
 
 And then give the cassandra user ownership of those directories:
 
 sudo chown -R cassandra:cassandra /mnt/cassandra 
 
 Once this is done Cassandra will have the correct directories and permission 
 to start up.
 
 
 Mark
 
 
 On 27 August 2014 09:50, Stephen Portanova sport...@gmail.com wrote:
 I already have a 3node m3.large DSE cluster, but I can't seem to add another 
 m3.large node. I'm using the ubuntu-trusty-14.04-amd64-server-20140607.1 
 (ami-a7fdfee2) AMI (instance-store backed, PV) on AWS, I install java 7 and 
 the JNA, then I go into opscenter to add a node. Things look good for 3 or 4 
 green circles, until I either get this error: Start Errored: Timed out 
 waiting for Cassandra to start. or this error: Agent Connection Errored: 
 Timed out waiting for agent to connect.
 
 I check the system.log and output.log, and they both say:
 INFO [main] 2014-08-27 08:17:24,642 CLibrary.java (line 121) JNA mlockall 
 successful
 ERROR [main] 2014-08-27 08:17:24,644 CassandraDaemon.java (line 235) 
 Directory /mnt/cassandra/data doesn't exist
 ERROR [main] 2014-08-27 08:17:24,645 CassandraDaemon.java (line 239) Has no 
 permission to create /mnt/cassandra/data directory
  INFO [Thread-1] 2014-08-27 08:17:24,646 DseDaemon.java (line 477) DSE 
 shutting down...
 ERROR [Thread-1] 2014-08-27 08:17:24,725 CassandraDaemon.java (line 199) 
 Exception in thread Thread[Thread-1,5,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1263)
 at com.datastax.bdp.gms.DseState.setActiveStatus(DseState.java:171)
 at com.datastax.bdp.server.DseDaemon.stop(DseDaemon.java:478)
 at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:384)
 
 My agent.log file says:
 Node is still provisioning, not attempting to determine ip.
 
  INFO [Initialization] 2014-08-27 08:40:57,848 Sleeping for 20s before trying 
 to determine IP over JMX again
 
  INFO [Initialization] 2014-08-27 08:41:17,849 Node is still provisioning, 
 not attempting to determine ip.
 
  INFO [Initialization] 2014-08-27 08:41:17,849 Sleeping for 20s before trying 
 to determine IP over JMX again
 
  INFO [Initialization] 2014-08-27 08:41:37,849 Node is still provisioning, 
 not attempting to determine ip.
 
  INFO [Initialization] 2014-08-27 08:41:37,850 Sleeping for 20s before trying 
 to determine IP over JMX again
 
  INFO [Initialization] 2014-08-27 08:41:57,850 Node is still provisioning, 
 not attempting to determine ip.
 
 
 
 I feel like I'm missing something easy with the mount, so if you could point 
 me in the right direction, I would really appreciate it!
 
 -- 
 Stephen Portanova
 (480) 495-2634
 
 
 
 
 -- 
 Stephen Portanova
 (480) 495-2634

Re: Heterogenous cluster and vnodes

2014-08-30 Thread Ben Bromhead


 Hey,
 
 I have a few of VM host (bare metal) machines with varying amounts of free 
 hard drive space on them. For simplicity let’s say I have three machine like 
 so:
  * Machine 1
   - Harddrive 1: 150 GB available.
  * Machine 2:
   - Harddrive 1: 150 GB available.
   - Harddrive 2: 150 GB available.
  * Machine 3.
   - Harddrive 1: 150 GB available.
 
 I am setting up a Cassandra cluster between them and as I see it I have two 
 options:
 
 1. I set up one Cassandra node/VM per bare metal machine. I assign all free 
 hard drive space to each Cassandra node and I balance the cluster using 
 vnodes proportionally to the amount of free hard drive space (CPU/RAM is not 
 going to be a bottle neck here).
 
 2. I set up four VMs, each running a Cassandra node with equal amount of hard 
 drive space and equal amount of vnodes. Machine 2 runs two VMs.

This setup will potentially create a situation where if Machine 2 goes down you 
may lose two replicas. As the two VMs on Machine 2 might be replicas for the 
same key.

 
 General question: Is any of these preferable to the other? I understand 1) 
 yields lower high-availability (since nodes are on the same hardware).

Other way around (2 would be potentially lower availability)… Cassandra thinks 
two of the vm's are separate when they in fact rely on the same underlying 
machine.

 
 Question about alternative 1: With varying vnodes, can I always be sure that 
 replicas are never put on the same virtual machine?

Yes… mostly https://issues.apache.org/jira/browse/CASSANDRA-4123

 Or is varying vnodes really only useful/recommended when migrating from 
 machines with varying hardware (like mentioned in [1])?

Changing the number of vnodes changes the portion of the ring a node is 
responsible for. You can use it to account for different types of hardware, you 
can also use it for creating awesome situations like hotspots if you aren't 
careful… ymmv.

At the end of the day I would throw out the extra hard drive / not use it / put 
more hard drives in the other machines. Why? Hard drives are cheap and your 
time as an admin for the cluster isn't. If you do add more hard drives you can 
also split out the commit log etc onto different disks.

I would take less problems over trying to draw every last scrap of performance 
out of the available hardware any day of the year. 


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

Re: Question about EC2 and SSDs

2014-09-04 Thread Ben Bromhead

On 5 Sep 2014, at 10:05 am, Steve Robenalt sroben...@highwire.org wrote:

 We are migrating a small cluster on AWS from instances based on spinning 
 disks (using instance store) to SSD-backed instances and we're trying to pick 
 the proper instance type. Some of the recommendations for spinning disks say 
 to use different drives for log vs data partitions to avoid issues with seek 
 delays and contention for the disk heads. Since SSDs don't have the same seek 
 delays, is it still recommended to use 2 SSD drives? Or is one sufficient?

As a side note, splitting the commit log and data dirs into different volumes 
doesn’t do a whole lot of good on AWS irrespective of whether you are on 
spinning disks or SSDs. Simply because the volumes presented to the vm may be 
on the same disk. 

Just raid the available volumes and be done with it.

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

Re: Moving Cassandra from EC2 Classic into VPC

2014-09-08 Thread Ben Bromhead

On 8 Sep 2014, at 12:34 pm, Oleg Dulin oleg.du...@gmail.com wrote:

 Another idea I had was taking the ec2-snitch configuration and converting it 
 into a Property file snitch. But I still don't understand how to perform this 
 move since I need my newly created VPC instances to have public IPs -- 
 something I would like to avoid.

Off the top of my head something like this might work if you want a no downtime 
approach:

Use the gossiping property file snitch in the VPC data centre. 

Use a public elastic ip for each node.

Have the instances in the VPC join your existing cluster.

Decommission old cluster.

Change the advertised endpoint addresses afterwards to the private addresses 
for nodes in the VPC using the following:
https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/

Once that is done, remove the elastic IPs from the instances.

Re: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Ben Bromhead

EBS vs local SSD in terms of latency you are using ms as your unit of
measurement.
If your query runs for 10s you will not notice anything. What is a few less
ms for the life of a 10 second query.

To reiterate what Rob said. The query is probably slow because of your use
case / data model, not the underlying disk.



On 17 September 2014 14:21, Tony Anecito adanec...@yahoo.com wrote:

 If you cached your tables or the database you may not see any difference
 at all.

 Regards,
 -Tony


   On Tuesday, September 16, 2014 6:36 PM, Mohammed Guller 
 moham...@glassbeam.com wrote:


 Hi -

 We are running Cassandra 2.0.5 on AWS on m3.large instances. These
 instances were using EBS for storage (I know it is not recommended). We
 replaced the EBS storage with SSDs. However, we didn't see any change in
 read latency. A query that took 10 seconds when data was stored on EBS
 still takes 10 seconds even after we moved the data directory to SSD. It is
 a large query returning 200,000 CQL rows from a single partition. We are
 reading 3 columns from each row and the combined data in these three
 columns for each row is around 100 bytes. In other words, the raw data
 returned by the query is approximately 20MB.

 I was expecting at least 5-10 times reduction in read latency going from
 EBS to SSD, so I am puzzled why we are not seeing any change in performance.

 Does anyone have insight as to why we don't see any performance impact on
 the reads going from EBS to SSD?

 Thanks,
 Mohammed





-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359

Re: Repair taking long time

2014-09-30 Thread Ben Bromhead

use https://github.com/BrianGallew/cassandra_range_repair



On 30 September 2014 05:24, Ken Hancock ken.hanc...@schange.com wrote:


 On Mon, Sep 29, 2014 at 2:29 PM, Robert Coli rc...@eventbrite.com wrote:


 As an aside, you just lose with vnodes and clusters of the size. I
 presume you plan to grow over appx 9 nodes per DC, in which case you
 probably do want vnodes enabled.


 I typically only see discussion on vnodes vs. non-vnodes, but it seems to
 me that might be more important to discuss the number of vnodes per node.
 A small cluster having 256 vnodes/node is unwise given some of the
 sequential operations that are still done.  Even if operations were done in
 parallel, having a 256x increase in parallelization seems an equally bad
 choice.

 I've never seen any discussion on how many vnodes per node might be an
 appropriate answer based a planned cluster size -- does such a thing exist?

 Ken







-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359

Re: best practice for waiting for schema changes to propagate

2014-09-30 Thread Ben Bromhead

The system.peers table which is a copy of some gossip info the node has
stored, including the schema version. You should query this and wait until
all schema versions have converged.

http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_sys_tab_cluster_t.html

http://www.datastax.com/dev/blog/the-data-dictionary-in-cassandra-1-2

As ensuring that the driver keeps talking to the node you made the schema
change on I would ask the drivers specific mailing list / IRC:


   - MAILING LIST:
   https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user
   - IRC: #datastax-drivers on irc.freenode.net http://freenode.net/



On 30 September 2014 10:16, Clint Kelly clint.ke...@gmail.com wrote:

 Hi all,

 I often have problems with code that I write that uses the DataStax Java
 driver to create / modify a keyspace or table and then soon after reads the
 metadata for the keyspace to verify that whatever changes I made the
 keyspace or table are complete.

 As an example, I may create a table called `myTableName` and then very
 soon after do something like:

 assert(session
   .getCluster()
   .getMetaData()
   .getKeyspace(myKeyspaceName)
   .getTable(myTableName) != null)

 I assume this fails sometimes because the default round-robin load
 balancing policy for the Java driver will send my create-table request to
 one node and the metadata read to another, and because it takes some time
 for the table creation to propagate across all of the nodes in my cluster.

 What is the best way to deal with this problem?  Is there a standard way
 to wait for schema changes to propagate?

 Best regards,
 Clint




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359

Re: Experience with multihoming cassandra?

2014-09-30 Thread Ben Bromhead

I'm guessing your talking about multi-homing because you want to have
multiple tenants (different apps/ teams etc) to make better use of
resources ?

As Jared mentioned running multiple Cassandra processes on the same
hardware that participate in the same cluster doesn't make much sense from
a failure domain point of view (it could mess up how C* replicates with
replicas for a key being potentially on the same physical server).

As for splitting up a server for multi-tenancy purposes this then becomes a
question of virtualisation as while there is some multi-tenant support in
C* (auth, throttling per keyspace), it is fairly limited at best.

There a whole range of options out there ranging from xen, vmware etc
through to lightweight virtualisation like linux namespaces with cgroups,
etc.

I think Spotify run C* in production using namespaces with cgroup iirc and
you could using something like docker to help manage this for you.

Docker will also help with managing network addressing etc (the multi homed
aspect). We've also had a lot of success running C* with docker (and
previously SmartOS and solaris zones). Though you will be treading new /
undocumented ground and thus expect to have to solve a few issues along the
way.



On 26 September 2014 04:32, Jared Biel jared.b...@bolderthinking.com
wrote:

 Doing this seems counter-productive to Cassandra's design/use-cases. It's
 best at home running on a large number of smaller servers rather than a
 small number of large servers. Also, as you said, you won't get any of the
 high availability benefits that it offers if you run multiple copies of
 Cassandra on the same box.


 On 25 September 2014 16:58, Donald Smith donald.sm...@audiencescience.com
  wrote:

  We have large boxes with 256G of RAM and SSDs.  From iostat, top,
 and sar we think the system has excess capacity.  Anyone have
 recommendations about multihoming
 http://en.wikipedia.org/wiki/Multihoming cassandra on such a node
 (connecting it to multiple IPs and running multiple cassandras
 simultaneously)?  I’m skeptical, since Cassandra already has built-in
 multi-threading and since if the node went down multiple nodes would
 disappear.  We’re using C* version 2.0.9.



 A google/bing search for  multihoming cassandra doesn’t turn much up.



 *Donald A. Smith* | Senior Software Engineer
 P: 425.201.3900 x 3866
 C: (206) 819-5965
 F: (646) 443-2333
 dona...@audiencescience.com


 [image: AudienceScience]







-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359

Re: DSE install interfering with apache Cassandra 2.1.0

2014-09-30 Thread Ben Bromhead

check your cqlshrc file (sometimes in ~/.cassandra) ?

I've been caught out before when playing with a RC of 2.1

On 30 September 2014 01:25, Andrew Cobley a.e.cob...@dundee.ac.uk wrote:

  Without the apache cassandra running I ran jps -l on this machine ,the
 only result was

  338 sun.tool.jps.Jps

  The Mac didn’t like the netstat command so I ran

  netstat -atp tcp |  grep 9160

  no result

  Also  for the native port:

  netstat-atp tcp | grep 9042

  gave no result (command may be wrong)

  So I ran port scan using the network utility (between 0 and 1).
 Results as shown:


  Port Scan has started…

 Port Scanning host: 127.0.0.1

  Open TCP Port: 631ipp
 Port Scan has completed…


  Hope this helps.

  Andy


  On 29 Sep 2014, at 15:09, Sumod Pawgi spa...@gmail.com wrote:

  Please run jps to check which Java services are still running and to
 make sure if c* is running. Then please check if 9160 port is in use.
 netstat -nltp | grep 9160

  This will confirm what is happening in your case.

 Sent from my iPhone

 On 29-Sep-2014, at 7:15 pm, Andrew Cobley a.e.cob...@dundee.ac.uk wrote:

  Hi All,

  Just come across this one, I’m at a bit of a loss on how to fix it.

  A user here did the following steps

  On a MAC
 Install Datastax Enterprise (DSE) using the dmg file
 test he can connect using the DSE cqlsh window
 Unistall DSE (full uninstall which stops the services)

  download apache cassandra 2.1.0
 unzip
 change to the non directory run sudo ./cassandra

  Now when he tries to connect using cqlsh from apache cassandra 2.1.0 bin
   he gets

  Connection error: ('Unable to connect to any servers', {'127.0.0.1':
 ConnectionShutdown('Connection AsyncoreConnection(4528514448)
 127.0.0.1:9160 (closed) is already closed',)})

  This is probably related to

 http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201409.mbox/%3CCALHCZd7RGSahJUbK32WoTr9JRoA+4K=mrfocmxuk0nbzoqq...@mail.gmail.com%3E

  but I can’t see why the uninstall of DSE is leaving the apache cassandra
 release cqlsh unable to attach to the apache cassandra runtime.

  Ta
 Andy



 The University of Dundee is a registered Scottish Charity, No: SC015096



 The University of Dundee is a registered Scottish Charity, No: SC015096




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359

Re: DSE install interfering with apache Cassandra 2.1.0

2014-09-30 Thread Ben Bromhead

Only recently! Moving off list (c* users bcc'd).

On 30 September 2014 19:20, Andrew Cobley a.e.cob...@dundee.ac.uk wrote:

  HI Ben,

  yeah, that was it, recovered from the Cassandra summit ?


  Andy

   On 30 Sep 2014, at 08:19, Ben Bromhead b...@instaclustr.com wrote:

  check your cqlshrc file (sometimes in ~/.cassandra) ?

  I've been caught out before when playing with a RC of 2.1

 On 30 September 2014 01:25, Andrew Cobley a.e.cob...@dundee.ac.uk wrote:

 Without the apache cassandra running I ran jps -l on this machine ,the
 only result was

  338 sun.tool.jps.Jps

  The Mac didn’t like the netstat command so I ran

  netstat -atp tcp |  grep 9160

  no result

  Also  for the native port:

  netstat-atp tcp | grep 9042

  gave no result (command may be wrong)

  So I ran port scan using the network utility (between 0 and 1).
 Results as shown:


  Port Scan has started…

 Port Scanning host: 127.0.0.1

  Open TCP Port: 631ipp
 Port Scan has completed…


  Hope this helps.

  Andy


  On 29 Sep 2014, at 15:09, Sumod Pawgi spa...@gmail.com wrote:

  Please run jps to check which Java services are still running and to
 make sure if c* is running. Then please check if 9160 port is in use.
 netstat -nltp | grep 9160

  This will confirm what is happening in your case.

 Sent from my iPhone

 On 29-Sep-2014, at 7:15 pm, Andrew Cobley a.e.cob...@dundee.ac.uk
 wrote:

  Hi All,

  Just come across this one, I’m at a bit of a loss on how to fix it.

  A user here did the following steps

  On a MAC
 Install Datastax Enterprise (DSE) using the dmg file
 test he can connect using the DSE cqlsh window
 Unistall DSE (full uninstall which stops the services)

  download apache cassandra 2.1.0
 unzip
 change to the non directory run sudo ./cassandra

  Now when he tries to connect using cqlsh from apache cassandra 2.1.0
 bin   he gets

  Connection error: ('Unable to connect to any servers', {'127.0.0.1':
 ConnectionShutdown('Connection AsyncoreConnection(4528514448)
 127.0.0.1:9160 (closed) is already closed',)})

  This is probably related to

 http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201409.mbox/%3CCALHCZd7RGSahJUbK32WoTr9JRoA+4K=mrfocmxuk0nbzoqq...@mail.gmail.com%3E

  but I can’t see why the uninstall of DSE is leaving the apache
 cassandra release cqlsh unable to attach to the apache cassandra runtime.

  Ta
 Andy



 The University of Dundee is a registered Scottish Charity, No: SC015096



 The University of Dundee is a registered Scottish Charity, No: SC015096




  --

 Ben Bromhead

 Instaclustr | www.instaclustr.com | @instaclustr
 http://twitter.com/instaclustr | +61 415 936 359



 The University of Dundee is a registered Scottish Charity, No: SC015096




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359

Re: How to enable client-to-node encrypt communication with Astyanax cassandra client

2014-10-08 Thread Ben Bromhead

Haven't personally followed this but give it a go:
http://lyubent.github.io/security/planetcassandra/2013/05/31/ssl-for-astyanax.html

On 8 October 2014 20:46, Lu, Boying boying...@emc.com wrote:

 Hi, All,



 I’m trying to enable client-to-node encrypt communication in Cassandra
 (2.0.7) with Astyanax client library (version=1.56.48)



 I found the links about how to enable this feature:


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/security/secureSSLClientToNode_t.html

 But this only says how to set up in the server side, but not the client
 side.



 Here is my configuration on the server side (in yaml):

 client_encryption_options:

 enabled: true

 keystore:  full-path-to-keystore-file   *#same file used by Cassandra
 server*

 keystore_password: some-password

 truststore: fullpath-to-truststore-file  *#same file used by
 Cassandra server*

 truststore_password: some-password

 # More advanced defaults below:

 # protocol: TLS

 # algorithm: SunX509

 # store_type: JKS

 cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA]

 require_client_auth: true




 http://www.datastax.com/dev/blog/accessing-secure-dse-clusters-with-cql-native-protocol

 This link says something about client side, but not how to do it with the
 Astyanax client library.



 Searching the Astyanax source codes, I found the class
 SSLConnectionContext maybe useful

 And here is my code snippet:

 AstyanaxContextCluster clusterContext = new AstyanaxContext.Builder()

 .forCluster(clusterName)

 .forKeyspace(keyspaceName)

 .withAstyanaxConfiguration(new AstyanaxConfigurationImpl()

 .setRetryPolicy(new QueryRetryPolicy(10, 1000)))

 .withConnectionPoolConfiguration(new
 ConnectionPoolConfigurationImpl(_clusterName)

 .setMaxConnsPerHost(1)

 .setAuthenticationCredentials(credentials)

 *.setSSLConnectionContext(sslContext)*

 .setSeeds(String.format(%1$s:%2$d, uri.getHost(),

 uri.getPort()))

 )

 .buildCluster(ThriftFamilyFactory.getInstance());



 But when I tried to connect to the Cassandra server, I got following error:

 Caused by: org.apache.thrift.transport.TTransportException:
 javax.net.ssl.SSLHandshakeException: Remote host closed connection during
 handshake

 at
 org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161)

 at
 org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:158)

 at
 org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)

 at
 org.apache.cassandra.thrift.Cassandra$Client.send_login(Cassandra.java:567)

 at
 org.apache.cassandra.thrift.Cassandra$Client.login(Cassandra.java:559)

 at
 com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.open(ThriftSyncConnectionFactoryImpl.java:203)

 ... 6 more



 It looks like that my SSL settings are incorrect.



 Does anyone know how to resolve this issue?



 Thanks



 Boying




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359

Re: Error: No module named cql

2014-10-14 Thread Ben Bromhead

It looks like easy_install is using python2.6 and installing cql in
the 2.6 packages directory: /usr/lib/python2.6/site-packages/

cqlsh is using the python executable for you environment (which looks
like 2.7) and thus is looking for cql in the site packages dir
(amongst others).

To quickly install the cql module explicitly for python 2.7 run:
python2.7 -m easy_install cql

Though you might also want to sort out your easy_install so it matches
the version of python that is used by default.

On 15 October 2014 11:48, Tim Dunphy bluethu...@gmail.com wrote:
 Hey all,

  I'm using cassandra 2.1.0 on CentOS 6.5

  And when I try to run cqlsh on the command line I get this error:

 root@beta-new:~] #cqlsh

 Python CQL driver not installed, or not on PYTHONPATH.
 You might try easy_install cql.

 Python: /usr/local/bin/python
 Module load path: ['/usr/local/apache-cassandra-2.1.0/bin',
 '/usr/local/lib/python27.zip', '/usr/local/lib/python2.7',
 '/usr/local/lib/python2.7/plat-linux2', '/usr/local/lib/python2.7/lib-tk',
 '/usr/local/lib/python2.7/lib-old', '/usr/local/lib/python2.7/lib-dynload',
 '/root/.local/lib/python2.7/site-packages',
 '/usr/local/lib/python2.7/site-packages']

 Error: No module named cql


 I tried following the advice from the error and ran that command:

 [root@beta-new:~] #easy_install cql
 Searching for cql
 Best match: cql 1.4.0
 Processing cql-1.4.0-py2.6.egg
 cql 1.4.0 is already the active version in easy-install.pth

 Using /usr/lib/python2.6/site-packages/cql-1.4.0-py2.6.egg
 Processing dependencies for cql
 Finished processing dependencies for cql

 And that seems to go ok!

 However when I try to run it again:

 [root@beta-new:~] #cqlsh

 Python CQL driver not installed, or not on PYTHONPATH.
 You might try easy_install cql.

 Python: /usr/local/bin/python
 Module load path: ['/usr/local/apache-cassandra-2.1.0/bin',
 '/usr/local/lib/python27.zip', '/usr/local/lib/python2.7',
 '/usr/local/lib/python2.7/plat-linux2', '/usr/local/lib/python2.7/lib-tk',
 '/usr/local/lib/python2.7/lib-old', '/usr/local/lib/python2.7/lib-dynload',
 '/root/.local/lib/python2.7/site-packages',
 '/usr/local/lib/python2.7/site-packages']

 Error: No module named cql

 I get the same exact error. How on earth do I break out of this feeback
 loop?

 Thanks!
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B




-- 
Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

Re: Dependency Hell: STORM 0.9.2 and Cassandra 2.0

2014-10-25 Thread Ben Bromhead

I haven't had to deal with this problem specifically and don't know if
there is a storm specific solution, but the general Java way of dealing
with projects who have conflicting dependencies would be to either exclude
one of the conflicting dependencies using maven and see if it works,
otherwise rename the conflicting dependency using
http://maven.apache.org/plugins/maven-dependency-plugin/usage.html so both
projects can use there own versions of guava without the package names
conflicting (and the jvm will load the correct classes for each dep).

On 25 October 2014 06:13, Gary Zhao garyz...@gmail.com wrote:

 Hello

 Anyone encountered the following issue and any workaround? Our Storm
 topology was written in Clojure.

 
 Our team is upgrading one of our storm topology from using cassandra 1.2
 to cassandra 2.0, and we have found one problem that is difficult to
 tackle. Cassandra 2.0Java driver requires google guava 1.6. Unfortuanately,
 storm 0.9.2 provides a lower version. Because of that, a topology will not
 be able to contact Cassandra databases.

 Thanks
 Gary




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359

Re: Repair/Compaction Completion Confirmation

2014-10-27 Thread Ben Bromhead

https://github.com/BrianGallew/cassandra_range_repair

This breaks down the repair operation into very small portions of the ring
as a way to try and work around the current fragile nature of repair.

Leveraging range repair should go some way towards automating repair (this
is how the automatic repair service in DataStax opscenter works, this is
how we perform repairs).

We have had a lot of success running repairs in a similar manner against
vnode enabled clusters. Not 100% bullet proof, but way better than nodetool
repair



On 28 October 2014 08:32, Tim Heckman t...@pagerduty.com wrote:

 On Mon, Oct 27, 2014 at 1:44 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Oct 27, 2014 at 1:33 PM, Tim Heckman t...@pagerduty.com wrote:

 I know that when issuing some operations via nodetool, the command
 blocks until the operation is finished. However, is there a way to reliably
 determine whether or not the operation has finished without monitoring that
 invocation of nodetool?

 In other words, when I run 'nodetool repair' what is the best way to
 reliably determine that the repair is finished without running something
 equivalent to a 'pgrep' against the command I invoked? I am curious about
 trying to do the same for major compactions too.


 This is beyond a FAQ at this point, unfortunately; non-incremental repair
 is awkward to deal with and probably impossible to automate.

 In The Future [1] the correct solution will be to use incremental repair,
 which mitigates but does not solve this challenge entirely.

 As brief meta commentary, it would have been nice if the project had
 spent more time optimizing the operability of the critically important
 thing you must do once a week [2].

 https://issues.apache.org/jira/browse/CASSANDRA-5483

 =Rob
 [1] http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1
 [2] Or, more sensibly, once a month with gc_grace_seconds set to 34 days.


 Thank you for getting back to me so quickly. Not the answer that I was
 secretly hoping for, but it is nice to have confirmation. :)

 Cheers!
 -Tim




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359

Re: bootstrapping manually when auto_bootstrap=false ?

2014-12-17 Thread Ben Bromhead

   - In Cassandra yaml set auto_bootstrap = false
   - Boot node
   - nodetool rebuild

Very similar to
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

On 18 December 2014 at 14:04, Kevin Burton bur...@spinn3r.com wrote:

 I’m trying to figure out the best way to bootstrap our nodes.

 I *think* I want our nodes to be manually bootstrapped.  This way an admin
 has to explicitly bring up the node in the cluster and I don’t have to
 worry about a script accidentally provisioning new nodes.

 The problem is HOW do you do it?

 I couldn’t find any reference anywhere in the documentation.

 I *think* I run nodetool repair? but it’s unclear..

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com



-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359

Re: simple data movement ?

2014-12-17 Thread Ben Bromhead

Just copy the data directory from each prod node to your test node (and
relevant configuration files etc).

If your IP addresses are different between test and prod, follow
https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/


On 18 December 2014 at 09:10, Langston, Jim jim.langs...@dynatrace.com
wrote:

  Hi all,

  I have set up a test environment with C* 2.1.2, wanting to test our
 applications against it. I currently have C* 1.2.9 in production and want
 to use that data for testing. What would be a good approach for simply
 taking a copy of the production data and moving it into the test env and
 having the test env C* use that data ?

  The test env. is identical is size, with the difference being the
 versions
 of C*.

  Thanks,

  Jim
  The contents of this e-mail are intended for the named addressee only. It
 contains information that may be confidential. Unless you are the named
 addressee or an authorized designee, you may not copy or use it, or
 disclose it to anyone else. If you received it in error please notify us
 immediately and then destroy it



-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359

Re: Deleted snapshot files filling up /var/lib/cassandra

2015-03-16 Thread Ben Bromhead

If you are running a sequential repair (or have previously run a sequential
repair that is still running) Cassandra will still have the file
descriptors open for files in the snapshot it is using for the repair
operation.

From the http://www.datastax.com/dev/blog/repair-in-cassandra

*Cassandra 1.2 introduced a new option to repair to help manage the
problems caused by the nodes all repairing with each other at the same
time, it is call a snapshot repair, or sequential repair. As of Cassandra
2.1, sequential repair is the default, and the old parallel repair an
option. Sequential repair has all of the nodes involved take a snapshot,
the snapshot lives until the repair finishes, and then is removed. By
taking a snapshot, repair can procede in a serial fashion, such that only
two nodes are ever comparing with each other at a time. This makes the
overall repair process slower, but decreases the burden placed on the
nodes, and means you have less impact on reads/writes to the system.*

On 16 March 2015 at 16:33, David Wahler dwah...@indeed.com wrote:

 On Mon, Mar 16, 2015 at 6:12 PM, Ben Bromhead b...@instaclustr.com wrote:
  Cassandra will by default snapshot your data directory on the following
  events:
 
  TRUNCATE and DROP schema events
  when you run nodetool repair
  when you run nodetool snapshot
 
  Snapshots are just hardlinks to existing SSTables so the only disk space
  they take up is for files that have since been compacted away. Disk space
  for snapshots will be freed when the last link to the files are removed.
 You
  can remove all snapshots in a cluster using nodetool clearsnapshot
 
  Snapshots will fail if you are out of disk space (this is
 counterintuitive
  to the above, but it is true), if you have not increased the number of
  available file descriptors or if there are permissions issues.
 
  Out of curiosity, how often are you running repair?

 Thanks for the information. We're running repair once per week, as
 recommended by the Datastax documentation. The repair is staggered to
 run on one machine at a time with the --partitioner-range option in
 order to spread out the load.

 Running nodetool clearsnapshot doesn't free up any space. I'm
 guessing that because the snapshot files have been deleted from the
 filesystem, Cassandra thinks the snapshots are already gone. But
 because it still has the file descriptors open, the disk space hasn't
 actually been reclaimed.




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: Deleted snapshot files filling up /var/lib/cassandra

2015-03-16 Thread Ben Bromhead

Cassandra will by default snapshot your data directory on the following
events:

   - TRUNCATE and DROP schema events
   - when you run nodetool repair
   - when you run nodetool snapshot

Snapshots are just hardlinks to existing SSTables so the only disk space
they take up is for files that have since been compacted away. Disk space
for snapshots will be freed when the last link to the files are removed.
You can remove all snapshots in a cluster using nodetool clearsnapshot

Snapshots will fail if you are out of disk space (this is counterintuitive
to the above, but it is true), if you have not increased the number of
available file descriptors or if there are permissions issues.

Out of curiosity, how often are you running repair?

On 16 March 2015 at 15:52, David Wahler dwah...@indeed.com wrote:

 On Mon, Mar 16, 2015 at 5:28 PM, Jan cne...@yahoo.com wrote:
  David;
 
  all the packaged installations use the /var/lib/cassandra directory.
  Could you check your yaml config files and see if you are using this
 default
  directory  for backups
 
  May want to change it to a location with more disk space.

 We're using the default /var/lib/cassandra as our data directory,
 mounted as its own LVM volume. I don't see anything in cassandra.yaml
 about a backup directory. There is an incremental_backups option
 which is set to false.

 Increasing the available disk space doesn't really seem like a
 solution. We have only about 450MB of live data on the most
 heavily-loaded server, and the space taken up by these deleted files
 is growing by several GB per day. For now we can work around the
 problem by periodically restarting servers to close the file handles,
 but that hurts our availability and seems like a hack.




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: Deleted snapshot files filling up /var/lib/cassandra

2015-03-20 Thread Ben Bromhead

Sorry for the late reply.

To immediately solve the problem you can restart Cassandra and all the open
file descriptors to the deleted snapshots should disappear.

As for why it happened I would first address the disk space issue and see
if the snapshot errors + open file descriptors issue still occurs (I am
unclear as to whether you got the snapshot exception after the disk filled
up or before), if you still have issues with repair not letting go of
snapshotted files even with free disk space I would look to raise a ticket
in Jira.

On 17 March 2015 at 12:46, David Wahler dwah...@indeed.com wrote:

 On Mon, Mar 16, 2015 at 6:51 PM, Ben Bromhead b...@instaclustr.com wrote:
  If you are running a sequential repair (or have previously run a
 sequential
  repair that is still running) Cassandra will still have the file
 descriptors
  open for files in the snapshot it is using for the repair operation.

 Yeah, that aligns with my understanding of how the repair process
 works. But the cluster has no repair sessions active (I think; when I
 run nodetool tpstats, the AntiEntropyStage and AntiEntropySessions
 values are zero on all nodes) and the space still hasn't been freed.




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: how to clear data from disk

2015-03-12 Thread Ben Bromhead

To clarify on why this behaviour occurs, by default Cassandra will snapshot
a table when you perform any destructive action (TRUNCATE, DROP etc)

see
http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/truncate_r.html

To free disk space after such an operation you will always need to clear
the snapshots (using either of above suggested methods). Unfortunately this
can be a bit painful if you are rotating your tables, say by month, and
want to remove the oldest one from disk as your client will need to speak
JMX as well.

You can disable this behaviour through the use of auto_snapshot in
cassandra.yaml. Though I would strongly recommend leaving this feature
enabled in any sane production environment and cleaning up snapshots as an
independent task!!

On 10 March 2015 at 20:43, Patrick McFadin pmcfa...@gmail.com wrote:

 Or just manually delete the files. The directories are broken down by
 keyspace and table.

 Patrick

 On Mon, Mar 9, 2015 at 7:50 PM, 曹志富 cao.zh...@gmail.com wrote:

 nodetool clearsnapshot

 --
 Ranger Tsao

 2015-03-10 10:47 GMT+08:00 鄢来琼 laiqiong@gtafe.com:

  Hi ALL,



 After drop table, I found the data is not removed from disk, I should
 reduce the gc_grace_seconds before the drop operation.

 I have to wait for 10 days, but there is not enough disk.

 Could you tell me there is method to clear the data from disk quickly?

 Thank you very much!



 Peter






-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: High latencies for simple queries

2015-03-28 Thread Ben Bromhead

cqlsh runs on the internal cassandra python drivers: cassandra-pylib and
cqlshlib.

I would not recommend using them at all (nothing wrong with them, they are
just not built with external users in mind).

I have never used python-driver in anger so I can't comment on whether it
is genuinely slower than the internal C* python driver, but this might be a
question for python-driver folk.

On 28 March 2015 at 00:34, Artur Siekielski a...@vhex.net wrote:

 On 03/28/2015 12:13 AM, Ben Bromhead wrote:

 One other thing to keep in mind / check is that doing these tests
 locally the cassandra driver will connect using the network stack,
 whereas postgres supports local connections over a unix domain socket
 (this is also enabled by default).

 Unix domain sockets are significantly faster than tcp as you don't have
 a network stack to traverse. I think any driver using libpq will attempt
 to use the domain socket when connecting locally.


 Good catch. I assured that psycopg2 connects through a TCP socket and the
 numbers increased by about 20%, but it still is an order of magnitude
 faster than Cassandra.


 But I'm going to hazard a guess something else is going on with the
 Cassandra connection as I'm able to get 0.5ms queries locally and that's
 even with trace turned on.


 Using python-driver?




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: run cassandra on a small instance

2015-02-22 Thread Ben Bromhead





 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: High latencies for simple queries

2015-03-27 Thread Ben Bromhead

Latency can be so variable even when testing things locally. I quickly
fired up postgres and did the following with psql:

ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i));
CREATE TABLE
ben=# \timing
Timing is on.
ben=# INSERT INTO foo VALUES(2, 'yay');
INSERT 0 1
Time: 1.162 ms
ben=# INSERT INTO foo VALUES(3, 'yay');
INSERT 0 1
Time: 1.108 ms

I then fired up a local copy of Cassandra (2.0.12)

cqlsh CREATE KEYSPACE foo WITH replication = { 'class' : 'SimpleStrategy',
'replication_factor' : 1 };
cqlsh USE foo;
cqlsh:foo CREATE TABLE foo(i int PRIMARY KEY, j text);
cqlsh:foo TRACING ON;
Now tracing requests.
cqlsh:foo INSERT INTO foo (i, j) VALUES (1, 'yay');

Tracing session: 7a7dced0-d4b2-11e4-b950-85c3c9bd91a0

 activity  | timestamp| source
   | source_elapsed
---+--+---+
execute_cql3_query | 11:52:55,229 |
127.0.0.1 |  0
 Parsing INSERT INTO foo (i, j) VALUES (1, 'yay'); | 11:52:55,229 |
127.0.0.1 | 43
   Preparing statement | 11:52:55,229 |
127.0.0.1 |141
 Determining replicas for mutation | 11:52:55,229 |
127.0.0.1 |291
Acquiring switchLock read lock | 11:52:55,229 |
127.0.0.1 |403
Appending to commitlog | 11:52:55,229 |
127.0.0.1 |413
Adding to foo memtable | 11:52:55,229 |
127.0.0.1 |432
  Request complete | 11:52:55,229 |
127.0.0.1 |541

All this on a mac book pro with 16gb of memory and an SSD

So ymmv?

On 27 March 2015 at 08:28, Tyler Hobbs ty...@datastax.com wrote:

 Just to check, are you concerned about minimizing that latency or
 maximizing throughput?

 I'll that latency is what you're actually concerned about.  A fair amount
 of that latency is probably happening in the python driver.  Although it
 can easily execute ~8k operations per second (using cpython), in some
 scenarios it can be difficult to guarantee sub-ms latency for an individual
 query due to how some of the internals work.  In particular, it uses
 python's Conditions for cross-thread signalling (from the event loop thread
 to the application thread).  Unfortunately, python's Condition
 implementation includes a loop with a minimum sleep of 1ms if the Condition
 isn't already set when you start the wait() call.  This is why, with a
 single application thread, you will typically see a minimum of 1ms latency.

 Another source of similar latencies for the python driver is the Asyncore
 event loop, which is used when libev isn't available.  I would make sure
 that you can use the LibevConnection class with the driver to avoid this.

 On Fri, Mar 27, 2015 at 6:24 AM, Artur Siekielski a...@vhex.net wrote:

 I'm running Cassandra locally and I see that the execution time for the
 simplest queries is 1-2 milliseconds. By a simple query I mean either
 INSERT or SELECT from a small table with short keys.

 While this number is not high, it's about 10-20 times slower than
 Postgresql (even if INSERTs are wrapped in transactions). I know that the
 nature of Cassandra compared to Postgresql is different, but for some
 scenarios this difference can matter.

 The question is: is it normal for Cassandra to have a minimum latency of
 1 millisecond?

 I'm using Cassandra 2.1.2, python-driver.





 --
 Tyler Hobbs
 DataStax http://datastax.com/




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Ben Bromhead

+1 would love to see how you do it

On 27 March 2015 at 07:18, Jonathan Haddad j...@jonhaddad.com wrote:

 I'd be interested to see that data model. I think the entire list would
 benefit!

 On Thu, Mar 26, 2015 at 8:16 PM Robert Wille rwi...@fold3.com wrote:

 I have a cluster which stores tree structures. I keep several hundred
 unrelated trees. The largest has about 180 million nodes, and the smallest
 has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in
 practice is probably less than 10. I am able to page through children and
 siblings. It works really well.

 Doesn’t sound like its exactly like what you’re looking for, but if you
 want any pointers on how I went about implementing mine, I’d be happy to
 share.

 On Mar 26, 2015, at 3:05 PM, List l...@airstreamcomm.net wrote:

  Not sure if this is the right place to ask, but we are trying to model
 a user-generated tree hierarchy in which they create child objects of a
 root node, and can create an arbitrary number of children (and children of
 children, and on and on).  So far we have looked at storing each tree
 structure as a single document in JSON format and reading/writing it out in
 it's entirety, doing materialized paths where we store the root id with
 every child and the tree structure above the child as a map, and some form
 of an adjacency list (which does not appear to be very viable as looking up
 the entire tree would be ridiculous).
 
  The hope is to end up with a data model that allows us to display the
 entire tree quickly, as well as see the entire path to a leaf when
 selecting that leaf.  If anyone has some suggestions/experience on how to
 model such a tree heirarchy we would greatly appreciate your input.
 




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: Really high read latency

2015-03-23 Thread Ben Bromhead

 wrote:

 Also, two control questions:

- Are you using EBS for data storage? It might introduce
additional latencies.
- Are you doing proper paging when querying the keyspace?

 Cheers,
 Jens

 On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith 
 david92galbra...@gmail.com wrote:

 Hi! So I've got a table like this:

 CREATE TABLE default.metrics (row_time int,attrs varchar,offset
 int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
 STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
 comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 
 AND
 index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
 AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
 speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
 compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
 AND compression={'sstable_compression':'LZ4Compressor'};

 and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with
 4 GB of heap space. So it's timeseries data that I'm doing so I increment
 row_time each day, attrs is additional identifying information about
 each series, and offset is the number of milliseconds into the day for
 each data point. So for the past 5 days, I've been inserting 3k
 points/second distributed across 100k distinct attrses. And now when I
 try to run queries on this data that look like

 SELECT * FROM default.metrics WHERE row_time = 5 AND attrs =
 'potatoes_and_jam'

 it takes an absurdly long time and sometimes just times out. I did
 nodetool cftsats default and here's what I get:

 Keyspace: default
 Read Count: 59
 Read Latency: 397.12523728813557 ms.
 Write Count: 155128
 Write Latency: 0.3675690719921613 ms.
 Pending Flushes: 0
 Table: metrics
 SSTable count: 26
 Space used (live): 35146349027
 Space used (total): 35146349027
 Space used by snapshots (total): 0
 SSTable Compression Ratio: 0.10386468749216264
 Memtable cell count: 141800
 Memtable data size: 31071290
 Memtable switch count: 41
 Local read count: 59
 Local read latency: 397.126 ms
 Local write count: 155128
 Local write latency: 0.368 ms
 Pending flushes: 0
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used: 2856
 Compacted partition minimum bytes: 104
 Compacted partition maximum bytes: 36904729268
 Compacted partition mean bytes: 986530969
 Average live cells per slice (last five minutes):
 501.66101694915255
 Maximum live cells per slice (last five minutes): 502.0
 Average tombstones per slice (last five minutes): 0.0
 Maximum tombstones per slice (last five minutes): 0.0

 Ouch! 400ms of read latency, orders of magnitude higher than it has
 any right to be. How could this have happened? Is there something
 fundamentally broken about my data model? Thanks!




 --
 Jens Rantil
 Backend engineer
 Tink AB

 Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32
 Web: www.tink.se

 Facebook https://www.facebook.com/#!/tink.se Linkedin
 http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
  Twitter https://twitter.com/tink








-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: Java 8

2015-05-07 Thread Ben Bromhead

DSE 4.6.5 supports Java 8 (
http://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/RNdse46.html?scroll=RNdse46__rel465)
and DSE 4.6.5 is Cassandra 2.0.14 under the hood.

I would go with 8

On 7 May 2015 at 04:51, Paulo Motta pauloricard...@gmail.com wrote:

 First link was broken (sorry), here is the correct link:
 http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREJNAabout_c.html

 2015-05-07 8:49 GMT-03:00 Paulo Motta pauloricard...@gmail.com:

 The official recommendation is to run with Java7 (
 http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installJREabout_c.html),
 mostly to play it safe I guess, however you can probably already run C*
 with Java8, since it has been stable for a while. We've been running with
 Java8 for several months now without any noticeable problem.

 Regarding source compatibility, the official plan is compile with Java8
 starting from version 3.0. You may find more information on this ticket:
 https://issues.apache.org/jira/browse/CASSANDRA-8168
 https://issues.apache.org/jira/browse/CASSANDRA-8168

 2015-05-07 8:32 GMT-03:00 Stefan Podkowinski stefan.podkowin...@1und1.de
 :

  Hi



 Are there any plans to support Java 8 for Cassandra 2.0, now that Java 7
 is EOL?

 Currently Java 7 is also recommended for 2.1. Are there any reasons not
 to recommend Java 8 for 2.1?



 Thanks,

 Stefan






-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: Data migration

2015-04-14 Thread Ben Bromhead

Use SSTableloader it comes with Cassandra and is designed for moving data
between clusters and is far simpler than sqoop. it should even work with
a schema change like you described (changing columns). It would
probably/definitely break if you were dropping tables.

Mind you I've never tried sstableloader while schema changes were occurring
so happy to be wrong.



On 14 April 2015 at 05:40, Prem Yadav ipremya...@gmail.com wrote:

 Look into sqoop. I believe using sqoop you can transfer data between C*
 clusters. I haven't tested it though.
 other option is to write a program to read from one cluster and write the
 required data to another.

 On Tue, Apr 14, 2015 at 12:27 PM, skrynnikov_m 
 skrinniko...@epsysoft.com.ua wrote:

 Hello!!!
 Need to migrate data from one C* cluster to another periodically. During
 migration schema can change(add or remove one, two fields). Could you
 please suggest some tool?





-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: Spark SQL JDBC Server + DSE

2015-06-02 Thread Ben Bromhead

 at the email above and delete this email and any attachments and
 destroy any copies thereof. Any review, retransmission, dissemination,
 copying or other use of, or taking any action in reliance upon, this
 information by persons or entities other than the intended recipient is
 strictly prohibited.





 *From: *Mohammed Guller moham...@glassbeam.com
 *Reply-To: *user@cassandra.apache.org
 *Date: *Thursday, May 28, 2015 at 8:26 PM
 *To: *user@cassandra.apache.org user@cassandra.apache.org
 *Subject: *RE: Spark SQL JDBC Server + DSE



 Anybody out there using DSE + Spark SQL JDBC server?



 Mohammed



 *From:* Mohammed Guller [mailto:moham...@glassbeam.com
 moham...@glassbeam.com]
 *Sent:* Tuesday, May 26, 2015 6:17 PM
 *To:* user@cassandra.apache.org
 *Subject:* Spark SQL JDBC Server + DSE



 Hi –

 As I understand, the Spark SQL Thrift/JDBC server cannot be used with the
 open source C*. Only DSE supports  the Spark SQL JDBC server.



 We would like to find out whether how many organizations are using this
 combination. If you do use DSE + Spark SQL JDBC server, it would be great
 if you could share your experience. For example, what kind of issues you
 have run into? How is the performance? What reporting tools you are using?



 Thank  you!



 Mohammed







-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: Deploying OpsCenter behind a HTTP(S) proxy

2015-06-18 Thread Ben Bromhead

OpsCenter is a little bit tricky to simply just rewrite urls, the lhr
requests and rest endpoints it hits are all specified a little differently
in the javascript app it loads.

We ended up monkey patching a buttload of the js files to get all the
requests working properly with our proxy. Everytime a new release of
OpsCenter comes out we have to rework it.

If you are a DSE customer I would raise it as a support issue :)



On 18 June 2015 at 02:29, Spencer Brown lilspe...@gmail.com wrote:

 First, your firewall should really be your frontend  There operational
 frontend is apache, which is common.  You want every url  with opscenter in
 it handled elsewhere.  You could also set up proxies for /.
 cluster-configs, etc...
 Then there is mod_rewrite, which provides a lot more granularity about
 when you want what gets handled where.I set up the architectural
 infrastructure for Orbitz and some major banks, and I'd be happpy to help
 you out on this.  I charge $30/hr., but what you need isn't very complex so
 we're really just talking $100.

 On Thu, Jun 18, 2015 at 5:13 AM, Jonathan Ballet jbal...@gfproducts.ch
 wrote:

 Hi,

 I'm looking for information on how to correctly deploy an OpsCenter
 instance behind a HTTP(S) proxy.

 I have a running instance of OpsCenter 5.1 reachable at
 http://opscenter:/opscenter/ but I would like to be able to
 serve this kind of tool under a single hostname on HTTPS along with other
 tools of this kind, for easier convenience.

 I'm currently using Apache as my HTTP front-end and I tried this
 naive configuration:

 VirtualHost *:80
 ServerName tools
 ...
 ProxyPreserveHost On
 # Proxy to OpsCenter #
 ProxyPass   /opscenter/ http://opscenter:/opscenter/
 ProxyPassReverse/opscenter/ http://opscenter:/opscenter/
 /VirtualHost

 This doesn't quite work, as OpsCenter seem to also serve specific
 endpoints from / directly


 Of course, it doesn't correctly work, as OpsCenter seem to also serve
 specific data from / directly, such as:

/cluster-configs
/TestCluster
/meta
/rc
/tcp

 Is there something I can configure in OpsCenter so that it serves these
 URLs from somewhere else, or a list of known URLs that I can remap on the
 proxy, or better yet, a known proxy configuration to put in front of
 OpsCenter?

 Regards,

 Jonathan





-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: Lucene index plugin for Apache Cassandra

2015-06-11 Thread Ben Bromhead

Looks awesome, do you have any examples/benchmarks of using these indexes
for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+?

On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote:

 Hi all,

 With the release of Cassandra 2.1.6, Stratio is glad to present its open
 source Lucene-based implementation of C* secondary indexes
 https://github.com/Stratio/cassandra-lucene-index as a plugin that can
 be attached to Apache Cassandra. Before the above changes, Lucene index was
 distributed inside a fork of Apache Cassandra, with all the difficulties
 implied. As of now, the fork is discontinued and new users should use the
 recently created plugin, which maintains all the features of Stratio
 Cassandra https://github.com/Stratio/stratio-cassandra.



 Stratio's Lucene index extends Cassandra’s functionality to provide near
 real-time distributed search engine capabilities such as with ElasticSearch
 or Solr, including full text search capabilities, free multivariable
 search, relevance queries and field-based sorting. Each node indexes its
 own data, so high availability and scalability is guaranteed.


 We hope this will be useful to the Apache Cassandra community.


 Regards,

 --

 Andrés de la Peña


 http://www.stratio.com/
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: check active queries on cluster

2015-06-01 Thread Ben Bromhead

A warning on enabling debug and trace logging on the write path. You will
be writing information about every query to disk.

If you have any significant volume of requests going through the nodes
things will get slow pretty quickly. At least with C*  2.1 and using the
default logging config.

On 1 June 2015 at 07:34, Sebastian Martinka sebastian.marti...@mercateo.com
 wrote:

  You could enable DEBUG logging for
 org.apache.cassandra.transport.Message and TRACE logging for
 org.apache.cassandra.cql3.QueryProcessor in the log4j-server.properties
 file:

 log4j.logger.org.apache.cassandra.transport.Message=DEBUG
 log4j.logger.org.apache.cassandra.cql3.QueryProcessor=TRACE


 Afterwards you get the following output from all PreparedStatements in the
 system.log file:


 DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,186 Message.java
 (line 302) Received: PREPARE INSERT INTO dba_test.cust_view (leid, vid,
 geoarea, ver) VALUES (?, ?, ?, ?);, v=2
 TRACE [Native-Transport-Requests:167] 2015-06-01 15:56:15,187
 QueryProcessor.java (line 283) Stored prepared statement
 61956319a6d7c84c25414c96edf6e38c with 4 bind markers
 DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 Tracing.java
 (line 159) request complete
 DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 Message.java
 (line 309) Responding: RESULT PREPARED 61956319a6d7c84c25414c96edf6e38c
 [leid(dba_test, cust_view),
 org.apache.cassandra.db.marshal.UTF8Type][vid(dba_test, cust_view),
 org.apache.cassandra.db.marshal.UTF8Type][geoarea(dba_test, cust_view),
 org.apache.cassandra.db.marshal.UTF8Type][ver(dba_test, cust_view),
 org.apache.cassandra.db.marshal.LongType] (resultMetadata=[0 columns]), v=2





 *Von:* Robert Coli [mailto:rc...@eventbrite.com]
 *Gesendet:* Freitag, 17. April 2015 19:23
 *An:* user@cassandra.apache.org
 *Betreff:* Re: check active queries on cluster



 On Thu, Apr 16, 2015 at 11:10 PM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 We want to track active queries on cassandra cluster. Is there any tool or
 way to find all active queries on cassandra ?



 You can get a count of them with :



  https://issues.apache.org/jira/browse/CASSANDRA-5084



 =Rob






-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: Multiple cassandra instances per physical node

2015-05-26 Thread Ben Bromhead

@Sean - You can manually change the ports used by Datastax agent using the
address.yaml file in the agent install directory.

+1 on using racks to separate it out... but it will increase operational
complexity somewhat

On 26 May 2015 at 08:11, Nate McCall n...@thelastpickle.com wrote:


 If you're running multiple nodes on a single server, vnodes give you no
 control over which instance has which key (whereas you can assign initial
 tokens).  Therefore you could have two of your three replicas on the same
 physical server which, if it goes down, you can't read or write at quorum.


 Yep. You *will* have overlapping ranges on each physical server so long as
 Vnodes  'number of nodes in the cluster'.




 However, can't you use the topology snitch to put both nodes in the same
 rack?  Won't that prevent the issue and still allow you to maintain quorum
 if a single server goes down?  If I have a 20-node cluster with 2 nodes on
 each physical server, can I use 10 racks to properly segment my partitions?


 That's a good point, yes. I'd still personally prefer the operational
 simplicity of simply spacing out token assignments though, but YMMV.



 --
 -
 Nate McCall
 Austin, TX
 @zznate

 Co-Founder  Sr. Technical Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692

Re: Back to the futex()? :(

2016-02-09 Thread Ben Bromhead

TfRa> are my JVM
>> args. I realized I neglected to adjust memtable_flush_writers as I was
>> writing this--so I'll get on that. Aside from that, I'm not sure what to
>> do. (Thanks, again, for reading.)
>>
>> * They were batched for consistency--I'm hoping to return to using them
>> when I'm back at normal load, which is tiny compared to backloading, but
>> the impact on performance was eye-opening.
>> ___
>> Will Hayworth
>> Developer, Engagement Engine
>> Atlassian
>>
>> My pronoun is "they". <http://pronoun.is/they>
>>
>>
>>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Re : Possibility of using 2 different snitches in the Multi_DC cluster

2016-02-03 Thread Ben Bromhead

Also you may want to run multiple data centres in the one AWS region (load
segmentation, spark etc). +1 GPFS for everything

On Wed, 3 Feb 2016 at 07:42 sai krishnam raju potturi <pskraj...@gmail.com>
wrote:

> thanks a lot Robert. Greatly appreciate it.
>
> thanks
> Sai
>
> On Tue, Feb 2, 2016 at 6:19 PM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Tue, Feb 2, 2016 at 1:23 PM, sai krishnam raju potturi <
>> pskraj...@gmail.com> wrote:
>>
>>> What is the possibility of using GossipingPropertFileSnitch on
>>> datacenters in our private cloud, and Ec2MultiRegionSnitch in AWS?
>>>
>>
>> You should just use GPFS everywhere.
>>
>> This is also the reason why you should not use EC2MRS if you might ever
>> have a DC that is outside of AWS. Just use GPFS.
>>
>> =Rob
>> PS - To answer your actual question... one "can" use different snitches
>> on a per node basis, but ONE REALLY REALLY SHOULDN'T CONSIDER THIS A VALID
>> APPROACH AND IF ONE TRIES AND FAILS I WILL POINT AND LAUGH AND NOT HELP
>> THEM :D
>>
>
> --
Ben Bromhead
CTO | Instaclustr
+1 650 284 9692

Re: Any tips on how to track down why Cassandra won't cluster?

2016-02-03 Thread Ben Bromhead

Check network connectivity. If you are using public addresses as the
broadcast, make sure you can telnet from one node to the other nodes public
address using the internode port.

Last time I looked into something like this, for some reason if you only
add a security group id to the allowed traffic in a security group you
still need to add public IP addresses for each node in a security groups
allowed inbound traffic as well.

On Wed, 3 Feb 2016 at 11:49 Richard L. Burton III <mrbur...@gmail.com>
wrote:

> I'm deploying 2 nodes at the moment using cassandra-dse on Amazon. I
> configured it to use EC2Snitch and configured rackdc to use us-east with
> rack "1".
>
> The second node points to the first node as the seed e.g., "seeds":
> ["54.*.*.*"] and all of the ports are open.
>
> Any suggestions on how to track down what might trigger this problem? I'm
> not receiving any exceptions.
>
>
> --
> -Richard L. Burton III
> @rburton
>
-- 
Ben Bromhead
CTO | Instaclustr
+1 650 284 9692

Re: EC2 storage options for C*

2016-02-03 Thread Ben Bromhead

gt;>>>>>> Thank you all for the suggestions. I'm torn between GP2 vs
>>>>>>> Ephemeral. GP2 after testing is a viable contender for our workload. The
>>>>>>> only worry I have is EBS outages, which have happened.
>>>>>>>
>>>>>>> On Sunday, January 31, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Also in that video - it's long but worth watching
>>>>>>>>
>>>>>>>> We tested up to 1M reads/second as well, blowing out page cache to
>>>>>>>> ensure we weren't "just" reading from memory
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jeff Jirsa
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jan 31, 2016, at 9:52 AM, Jack Krupansky <
>>>>>>>> jack.krupan...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> How about reads? Any differences between read-intensive and
>>>>>>>> write-intensive workloads?
>>>>>>>>
>>>>>>>> -- Jack Krupansky
>>>>>>>>
>>>>>>>> On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa <
>>>>>>>> jeff.ji...@crowdstrike.com> wrote:
>>>>>>>>
>>>>>>>>> Hi John,
>>>>>>>>>
>>>>>>>>> We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M
>>>>>>>>> writes per second on 60 nodes, we didn’t come close to hitting even 
>>>>>>>>> 50%
>>>>>>>>> utilization (10k is more than enough for most workloads). PIOPS is not
>>>>>>>>> necessary.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> From: John Wong
>>>>>>>>> Reply-To: "user@cassandra.apache.org"
>>>>>>>>> Date: Saturday, January 30, 2016 at 3:07 PM
>>>>>>>>> To: "user@cassandra.apache.org"
>>>>>>>>> Subject: Re: EC2 storage options for C*
>>>>>>>>>
>>>>>>>>> For production I'd stick with ephemeral disks (aka instance
>>>>>>>>> storage) if you have running a lot of transaction.
>>>>>>>>> However, for regular small testing/qa cluster, or something you
>>>>>>>>> know you want to reload often, EBS is definitely good enough and we 
>>>>>>>>> haven't
>>>>>>>>> had issues 99%. The 1% is kind of anomaly where we have flush blocked.
>>>>>>>>>
>>>>>>>>> But Jeff, kudo that you are able to use EBS. I didn't go through
>>>>>>>>> the video, do you actually use PIOPS or just standard GP2 in your
>>>>>>>>> production cluster?
>>>>>>>>>
>>>>>>>>> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng <
>>>>>>>>> br...@blockcypher.com> wrote:
>>>>>>>>>
>>>>>>>>>> Yep, that motivated my question "Do you have any idea what kind
>>>>>>>>>> of disk performance you need?". If you need the performance, its 
>>>>>>>>>> hard to
>>>>>>>>>> beat ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested
>>>>>>>>>> configuration. If you don't, though, EBS GP2 will save a _lot_ of 
>>>>>>>>>> headache.
>>>>>>>>>>
>>>>>>>>>> Personally, on small clusters like ours (12 nodes), we've found
>>>>>>>>>> our choice of instance dictated much more by the balance of price, 
>>>>>>>>>> CPU, and
>>>>>>>>>> memory. We're using GP2 SSD and we find that for our patterns the 
>>>>>>>>>> disk is
>>>>>>>>>> rarely the bottleneck. YMMV, of course.
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa <
>>>>>>>>>> jeff.ji...@crowdstrike.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> If you have to ask that question, I strongly recommend m4 or c4
>>>>>>>>>>> instances with GP2 EBS.  When you don’t care about replacing a node 
>>>>>>>>>>> because
>>>>>>>>>>> of an instance failure, go with i2+ephemerals. Until then, GP2 EBS 
>>>>>>>>>>> is
>>>>>>>>>>> capable of amazing things, and greatly simplifies life.
>>>>>>>>>>>
>>>>>>>>>>> We gave a talk on this topic at both Cassandra Summit and AWS
>>>>>>>>>>> re:Invent: https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s
>>>>>>>>>>> very much a viable option, despite any old documents online that say
>>>>>>>>>>> otherwise.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> From: Eric Plowe
>>>>>>>>>>> Reply-To: "user@cassandra.apache.org"
>>>>>>>>>>> Date: Friday, January 29, 2016 at 4:33 PM
>>>>>>>>>>> To: "user@cassandra.apache.org"
>>>>>>>>>>> Subject: EC2 storage options for C*
>>>>>>>>>>>
>>>>>>>>>>> My company is planning on rolling out a C* cluster in EC2. We
>>>>>>>>>>> are thinking about going with ephemeral SSDs. The question is this: 
>>>>>>>>>>> Should
>>>>>>>>>>> we put two in RAID 0 or just go with one? We currently run a 
>>>>>>>>>>> cluster in our
>>>>>>>>>>> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are 
>>>>>>>>>>> happy with
>>>>>>>>>>> the performance we are seeing thus far.
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>> Eric
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Steve Robenalt
>>> Software Architect
>>> sroben...@highwire.org <bza...@highwire.org>
>>> (office/cell): 916-505-1785
>>>
>>> HighWire Press, Inc.
>>> 425 Broadway St, Redwood City, CA 94063
>>> www.highwire.org
>>>
>>> Technology for Scholarly Communication
>>>
>>
>>
> --
Ben Bromhead
CTO | Instaclustr
+1 650 284 9692

Re: „Using Timestamp“ Feature

2016-02-23 Thread Ben Bromhead

When using client supplied timestamps you need to ensure the clock on the
client is in sync with the nodes in the cluster otherwise behaviour will be
unpredictable.

On Thu, 18 Feb 2016 at 08:50 Tyler Hobbs <ty...@datastax.com> wrote:

> 2016-02-18 2:00 GMT-06:00 Matthias Niehoff <
> matthias.nieh...@codecentric.de>:
>
>>
>> * is the 'using timestamp' feature (and providing statement timestamps)
>> sufficiently robust and mature to build an application on?
>>
>
> Yes.  It's been there since the start of CQL3.
>
>
>> * In a BatchedStatement, can different statements have different
>> (explicitly provided) timestamps, or is the BatchedStatement's timestamp
>> used for them all? Is this specified / stable behaviour?
>>
>
> Yes, you can separate timestamps per statement.  And, in fact, if you
> potentially mix inserts and deletes on the same rows, you *should *use
> explicit timestamps with different values.  See the timestamp notes here:
> http://cassandra.apache.org/doc/cql3/CQL.html#batchStmt
>
>
>> * cqhsh reports a syntax error when I use 'using timestamp' with an
>> update statement (works with 'insert'). Is there a good reason for this, or
>> is it a bug?
>>
>
> The "USING TIMESTAMP" goes in a different place in update statements.  It
> should be something like:
>
> UPDATE mytable USING TIMESTAMP ? SET col = ? WHERE key = ?
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Sudden disk usage

2016-02-17 Thread Ben Bromhead

+1 to checking for snapshots. Cassandra by default will automatically
snapshot tables before destructive actions like drop or truncate.

Some general advice regarding cleanup. Cleanup will result in a temporary
increase in both disk I/O load and disk space usage (especially with STCS).
It should only be used as part of a planned increase in capacity when you
still have plenty of disk space left on your existing nodes.

If you are running Cassandra in the cloud (AWS, Azure etc) you can add an
EBS volume, copy your sstables to it then bind mount it to the troubled CF
directory. This will give you some emergency disk space to let compaction
and cleanup do its thing safely.

On Tue, 16 Feb 2016 at 10:57 Robert Coli <rc...@eventbrite.com> wrote:

> On Sat, Feb 13, 2016 at 4:30 PM, Branton Davis <branton.da...@spanning.com
> > wrote:
>
>> We use SizeTieredCompaction.  The nodes were about 67% full and we were
>> planning on adding new nodes (doubling the cluster to 6) soon.
>>
>
> Be sure to add those new nodes one at a time.
>
> Have you checked for, and cleared, old snapshots? Snapshots are
> automatically taken at various times and have the unusual property of
> growing larger over time. This is because they are hard links of data files
> and do not take up disk space of their own until the files they link to are
> compacted into new files.
>
> =Rob
>
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Cassandra nodes reduce disks per node

2016-02-17 Thread Ben Bromhead

you can do this in a "rolling" fashion (one node at a time).

On Wed, 17 Feb 2016 at 14:03 Branton Davis <branton.da...@spanning.com>
wrote:

> We're about to do the same thing.  It shouldn't be necessary to shut down
> the entire cluster, right?
>
> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli <rc...@eventbrite.com>
> wrote:
>
>>
>>
>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal <anis...@gmail.com>
>> wrote:
>>>
>>> To accomplish this can I just copy the data from disk1 to disk2 with in
>>> the relevant cassandra home location folders, change the cassanda.yaml
>>> configuration and restart the node. before starting i will shutdown the
>>> cluster.
>>>
>>
>> Yes.
>>
>> =Rob
>>
>>
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: How can I make Cassandra stable in a 2GB RAM node environment ?

2016-03-07 Thread Ben Bromhead

+1 for
http://opensourceconnections.com/blog/2013/08/31/building-
the-perfect-cassandra-test-environment/
<http://opensourceconnections.com/blog/2013/08/31/building-the-perfect-cassandra-test-environment/>


We also run Cassandra on t2.mediums for our Developer clusters. You can
force Cassandra to do most "memory" things by hitting the disk instead (on
disk compaction passes, flush immediately to disk) and by throttling client
connections. In fact on the t2 series memory is not the biggest concern,
but rather the CPU credit issue.

On Mon, 7 Mar 2016 at 11:53 Robert Coli <rc...@eventbrite.com> wrote:

> On Fri, Mar 4, 2016 at 8:27 PM, Jack Krupansky <jack.krupan...@gmail.com>
> wrote:
>
>> Please review the minimum hardware requirements as clearly documented:
>>
>> http://docs.datastax.com/en/cassandra/3.x/cassandra/planning/planPlanningHardware.html
>>
>
> That is a document for Datastax Cassandra, not Apache Cassandra. It's
> wonderful that Datastax provides docs, but Datastax Cassandra is a superset
> of Apache Cassandra. Presuming that the requirements of one are exactly
> equivalent to the requirements of the other is not necessarily reasonable.
>
> Please adjust your hardware usage to at least meet the clearly documented
>> minimum requirements. If you continue to encounter problems once you have
>> corrected your configuration error, please resubmit the details with
>> updated hardware configuration details.
>>
>
> Disagree. OP specifically stated that they knew this was not a recommended
> practice. It does not seem unlikely that they are constrained to use this
> hardware for reasons outside of their control.
>
>
>> Just to be clear, development on less than 4 GB is not supported and
>> production on less than 8 GB is not supported. Those are not suggestions or
>> guidelines or recommendations, they are absolute requirements.
>>
>
> What does "supported" mean here? That Datastax will not provide support if
> you do not follow the above recommendations? Because it certainly is
> "supported" in the sense of "it can be made to work" ... ?
>
> The premise of a minimum RAM level seems meaningless without context. How
> much data are you serving from your 2GB RAM node? What is the rate of
> client requests?
>
> To be clear, I don't recommend trying to run production Cassandra with
> under 8GB of RAM on your node, but "absolute requirement" is a serious
> overstatement.
>
>
> http://opensourceconnections.com/blog/2013/08/31/building-the-perfect-cassandra-test-environment/
>
> Has some good discussion of how to run Cassandra in a low memory
> environment. Maybe someone should tell John that his 64MB of JVM heap for a
> test node is 62x too small to be "supported"? :D
>
> =Rob
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Optional TLS CQL Encryption

2016-04-20 Thread Ben Bromhead

Hi Jason

If you enable encryption it will be always on. Optional encryption is
generally a bad idea (tm). Also always creating a new session every query
is also a bad idea (tm) even without the minimal overhead of encryption.

If you are really hell bent on doing this you could have a node that is
part of the cluster but has -Dcassandra.join_ring=false set in jvm options
in cassandra-env.sh so it does not get any data and configure that to have
no encryption enabled. This is known as a fat client. Then connect to that
specific node whenever you want to do terrible non encrypted things.

Having said all that, please don't do this.

Cheers

On Tue, 19 Apr 2016 at 15:32 Jason J. W. Williams <jasonjwwilli...@gmail.com>
wrote:

> Hey Guys,
>
> Is there a way to make TLS encryption optional for the CQL listener? We'd
> like to be able to use for remote management connections but not for same
> datacenter usage (since the build/up  tear down cost is too high for things
> that don't use pools).
>
> Right now it appears if we enable encryption it requires it for all
> connections, which definitely is not what we want.
>
> -J
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Changing Racks of Nodes

2016-04-20 Thread Ben Bromhead

If the rack as defined in Cassandra stays the same (e.g.
cassandra-rackdc.properties), things will keep working as expected...
except when the actual rack (or fault domain) goes down and you are likely
to lose more nodes than expected.

If you change the rack as defined in Cassandra, the node will start
handling queries it does not have data for.

The best way to change the move racks is to decommission the node, then
bootstrap it with the new rack settings.

On Wed, 20 Apr 2016 at 15:49 Anubhav Kale <anubhav.k...@microsoft.com>
wrote:

> Hello,
>
>
>
> If a running node moves around and changes its rack in the process, when
> its back in the cluster (through ignore-rack property), is it a correct
> statement that queries will not see some data residing on this node until a
> repair is run ?
>
>
>
> Or, is it more like the node may get requests for the data it does not own
> (meaning data will never “disappear”) ?
>
>
>
> I’d appreciate some details on this topic from experts !
>
>
>
> Thanks !
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: SS Tables Files Streaming

2016-05-09 Thread Ben Bromhead

Yup, with repair and particularly bootstrap is there is a decent amount of
"over streaming" of data due to the fact it's just sending an sstable.

On Fri, 6 May 2016 at 14:49 Anubhav Kale <anubhav.k...@microsoft.com> wrote:

> Does repair really send SS Table files as is ? Wouldn’t data for tokens be
> distributed across SS Tables ?
>
>
>
> *From:* Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
> *Sent:* Friday, May 6, 2016 2:12 PM
>
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: SS Tables Files Streaming
>
>
>
> Also probably sstableloader / bulk loading interface
>
>
>
>
>
>
>
>
>
> (I don’t think any of these necessarily stream “as-is”, but that’s a
> different conversation I suspect)
>
>
>
>
>
> *From: *Jonathan Haddad
> *Reply-To: *"user@cassandra.apache.org"
> *Date: *Friday, May 6, 2016 at 1:52 PM
> *To: *"user@cassandra.apache.org"
> *Subject: *Re: SS Tables Files Streaming
>
>
>
> Repairs, bootstamp, decommission.
>
>
>
> On Fri, May 6, 2016 at 1:16 PM Anubhav Kale <anubhav.k...@microsoft.com>
> wrote:
>
> Hello,
>
>
>
> In what scenarios can SS Table files on disk from Node 1 go to Node 2 as
> is ?  I’m aware this happens in *nodetool rebuild* and I am assuming this
> does *not* happen in repairs. Can someone confirm ?
>
>
>
> The reason I ask is I am working on a solution for backup / restore and I
> need to be sure if I boot a node, start copying over backed up files then
> those files won’t get overwritten by something coming from other nodes.
>
>
>
> Thanks !
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: SS Tables Files Streaming

2016-05-09 Thread Ben Bromhead

Note that incremental repair strategies (2.1+) run anti-compaction against
sstables in the range being repaired, so this will prevent overstreaming
based on the ranges in the repair session.

On Mon, 9 May 2016 at 10:31 Ben Bromhead <b...@instaclustr.com> wrote:

> Yup, with repair and particularly bootstrap is there is a decent amount of
> "over streaming" of data due to the fact it's just sending an sstable.
>
> On Fri, 6 May 2016 at 14:49 Anubhav Kale <anubhav.k...@microsoft.com>
> wrote:
>
>> Does repair really send SS Table files as is ? Wouldn’t data for tokens
>> be distributed across SS Tables ?
>>
>>
>>
>> *From:* Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
>> *Sent:* Friday, May 6, 2016 2:12 PM
>>
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SS Tables Files Streaming
>>
>>
>>
>> Also probably sstableloader / bulk loading interface
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> (I don’t think any of these necessarily stream “as-is”, but that’s a
>> different conversation I suspect)
>>
>>
>>
>>
>>
>> *From: *Jonathan Haddad
>> *Reply-To: *"user@cassandra.apache.org"
>> *Date: *Friday, May 6, 2016 at 1:52 PM
>> *To: *"user@cassandra.apache.org"
>> *Subject: *Re: SS Tables Files Streaming
>>
>>
>>
>> Repairs, bootstamp, decommission.
>>
>>
>>
>> On Fri, May 6, 2016 at 1:16 PM Anubhav Kale <anubhav.k...@microsoft.com>
>> wrote:
>>
>> Hello,
>>
>>
>>
>> In what scenarios can SS Table files on disk from Node 1 go to Node 2 as
>> is ?  I’m aware this happens in *nodetool rebuild* and I am assuming
>> this does *not* happen in repairs. Can someone confirm ?
>>
>>
>>
>> The reason I ask is I am working on a solution for backup / restore and I
>> need to be sure if I boot a node, start copying over backed up files then
>> those files won’t get overwritten by something coming from other nodes.
>>
>>
>>
>> Thanks !
>>
>> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Authentication with Java driver

2017-02-07 Thread Ben Bromhead

On Tue, 7 Feb 2017 at 17:52 Yuji Ito <y...@imagine-orb.com> wrote:

Thanks Andrew, Ben,

My application creates a lot of instances connecting to Cassandra with
basically the same set of credentials.

Do you mean lots of instances of the process or lots of instances of the
cluster/session object?

After an instance connects to Cassandra with the credentials, can any
instance connect to Cassandra without credentials?

As long as you don't share the session or cluster objects. Each new
cluster/session will need to reauthenticate.

== example ==
A first = new A("database", "user", "password");  // proper credentials
r = first.get();
...
A other = new A("database", "user", "pass"); // wrong password
r = other.get();
== example ==

I want to refuse the `other` instance with improper credentials.

This looks like you are creating new cluster/session objects (filling in
the blanks for your pseudocode here). So "other" will not authenticate to
Cassandra.

This brings up a wider point of why you are doing this? Generally most
applications will create a single longed lived session object that lasts
the life of the application process.

I would not rely on Cassandra auth to authenticate downstream actors, not
because it's bad, just its generally inefficient to create lots of session
objects. The session object maintains a connection pool, pipelines
requests, is thread safe and generally pretty solid.

Yuji

On Wed, Feb 8, 2017 at 4:11 AM, Ben Bromhead <b...@instaclustr.com> wrote:

What are you specifically trying to achieve? Are you trying to authenticate
multiple Cassandra users from a single application instance? Or will your
have lot's of application instances connecting to Cassandra using the same
set of credentials? Or a combination of both? Multiple application
instances with different credentials?

On Tue, 7 Feb 2017 at 06:19 Andrew Tolbert <andrew.tolb...@datastax.com>
wrote:

Hello,

The API seems kind of not correct because credentials should be
usually set with a session but actually they are set with a cluster.

With the datastax driver, Session is what manages connection pools to each
node.  Cluster manages configuration and a separate connection ('control
connection') to subscribe to state changes (schema changes, node topology
changes, node up/down events).

So, if there are 1000 clients, then with this API it has to create
1000 cluster instances ?

I'm unsure how common it is for per-user authentication to be done when
connecting to the database.  I think an application would normally
authenticate with one set of credentials instead of multiple.  The protocol
Cassandra uses does authentication at the connection level instead of at
the request level, so that is currently a limitation to support something
like reusing Sessions for authenticating multiple users.

Thanks,
Andy

On Tue, Feb 7, 2017 at 7:19 AM Hiroyuki Yamada <mogwa...@gmail.com> wrote:

Hi,

The API seems kind of not correct because credentials should be
usually set with a session but actually they are set with a cluster.

So, if there are 1000 clients, then with this API it has to create
1000 cluster instances ?
1000 clients seems usual if there are many nodes (say 20) and each
node has some concurrency (say 50),
but 1000 cluster instances seems too many.

Is this an expected way to do this ? or
Is there any way to authenticate per session ?

Thanks,
Hiro

On Tue, Feb 7, 2017 at 11:38 AM, Yuji Ito <y...@imagine-orb.com> wrote:
> Hi all,
>
> I want to know how to authenticate Cassandra users for multiple instances
> with Java driver.
> For instance, each thread creates a instance to access Cassandra with
> authentication.
>
> As the implementation example, only the first constructor builds a cluster
> and a session.
> Other constructors use them.
> This example is implemented according to the datastax document: "Basically
> you will want to share the same cluster and session instances across your
> application".
>
http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra
>
> However, other constructors don't authenticate the user and the password.
> That's because they don't need to build a cluster and a session.
>
> So, should I create a cluster and a session per instance for the
> authentication?
> If yes, can I create a lot of instances(clusters and sessions) to access
C*
> concurrently?
>
> == example ==
> public class A {
>   private static Cluster cluster = null;
>   private static Map<String, Session> sessions = null;
>   private Session session;
>
>   public A (String keyspace, String user, String password) {
> if (cluster == null) {
> builder = Cluster.builder();
> ...
> builder = builder.withCredentials(use

Instaclustr Masters scholarship

2017-02-07 Thread Ben Bromhead

As part of our commitment to contributing back to the Apache Cassandra open
source project and the wider community we are always looking for ways we
can foster knowledge sharing and improve usability of Cassandra itself. One
of the ways we have done so previously was to open up our internal builds
and versions of Cassandra (https://github.com/instaclustr/cassandra).

We have also been looking at a few novel or outside the box ways we can
further contribute back to the community. As such, we are sponsoring a
masters project in conjunction with the Australian based University of
Canberra. Instaclustr’s staff will be available to provide advice and
feedback to the successful candidate.

*Scope*
Distributed database systems are relatively new technology compared to
traditional relational databases. Distributed advantages provide
significant advantages in terms of reliability and scalability but often at
a cost of increased complexity. This complexity presents challenges for
testing of these systems to prove correct operation across all possible
system states. The scope of this masters scholarship is to use the Apache
Cassandra repair process as an example to consider and improve available
approaches to distributed database systems testing.

The repair process in Cassandra is a scheduled process that runs to ensure
the multiple copies of each piece of data that is maintained by Cassandra
are kept synchronised. Correct operation of repairs has been an ongoing
challenge for the Cassandra project partly due to the difficulty in
designing and developing  comprehensive automated tests for this
functionality.

The expected scope of this project is to:

   - survey and understand the existing testing framework available as part
   of the Cassandra project, particularly as it pertains to testing repairs
   - consider, research and develop enhanced approaches to testing of
   repairs
   - submit any successful approaches to the Apache Cassandra project for
   feedback and inclusion in the project code base

Australia is a pretty great place to advance your education and is
welcoming of foreign students.

We are also open to sponsoring a PhD project with a more in depth focus for
the right candidate.

For more details please don't hesitate to get in touch with myself or reach
out to i...@instaclustr.com.

Cheers

Ben
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Authentication with Java driver

2017-02-07 Thread Ben Bromhead

What are you specifically trying to achieve? Are you trying to authenticate
multiple Cassandra users from a single application instance? Or will your
have lot's of application instances connecting to Cassandra using the same
set of credentials? Or a combination of both? Multiple application
instances with different credentials?

On Tue, 7 Feb 2017 at 06:19 Andrew Tolbert <andrew.tolb...@datastax.com>
wrote:

> Hello,
>
> The API seems kind of not correct because credentials should be
> usually set with a session but actually they are set with a cluster.
>
>
> With the datastax driver, Session is what manages connection pools to
> each node.  Cluster manages configuration and a separate connection
> ('control connection') to subscribe to state changes (schema changes, node
> topology changes, node up/down events).
>
>
> So, if there are 1000 clients, then with this API it has to create
> 1000 cluster instances ?
>
>
> I'm unsure how common it is for per-user authentication to be done when
> connecting to the database.  I think an application would normally
> authenticate with one set of credentials instead of multiple.  The protocol
> Cassandra uses does authentication at the connection level instead of at
> the request level, so that is currently a limitation to support something
> like reusing Sessions for authenticating multiple users.
>
> Thanks,
> Andy
>
>
> On Tue, Feb 7, 2017 at 7:19 AM Hiroyuki Yamada <mogwa...@gmail.com> wrote:
>
> Hi,
>
> The API seems kind of not correct because credentials should be
> usually set with a session but actually they are set with a cluster.
>
> So, if there are 1000 clients, then with this API it has to create
> 1000 cluster instances ?
> 1000 clients seems usual if there are many nodes (say 20) and each
> node has some concurrency (say 50),
> but 1000 cluster instances seems too many.
>
> Is this an expected way to do this ? or
> Is there any way to authenticate per session ?
>
> Thanks,
> Hiro
>
> On Tue, Feb 7, 2017 at 11:38 AM, Yuji Ito <y...@imagine-orb.com> wrote:
> > Hi all,
> >
> > I want to know how to authenticate Cassandra users for multiple instances
> > with Java driver.
> > For instance, each thread creates a instance to access Cassandra with
> > authentication.
> >
> > As the implementation example, only the first constructor builds a
> cluster
> > and a session.
> > Other constructors use them.
> > This example is implemented according to the datastax document:
> "Basically
> > you will want to share the same cluster and session instances across your
> > application".
> >
> http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra
> >
> > However, other constructors don't authenticate the user and the password.
> > That's because they don't need to build a cluster and a session.
> >
> > So, should I create a cluster and a session per instance for the
> > authentication?
> > If yes, can I create a lot of instances(clusters and sessions) to access
> C*
> > concurrently?
> >
> > == example ==
> > public class A {
> >   private static Cluster cluster = null;
> >   private static Map<String, Session> sessions = null;
> >   private Session session;
> >
> >   public A (String keyspace, String user, String password) {
> > if (cluster == null) {
> > builder = Cluster.builder();
> > ...
> > builder = builder.withCredentials(user, password);
> >     cluster = builder.build();
> > }
> > session = sessions.get(keyspace);
> > if (session == null) {
> > session = cluster.connection(keyspace);
> > sessions.put(keyspace, session)
> > }
> > ...
> >   }
> >   ...
> >   public ResultSet update(...) {
> >   ...
> >   public ResultSet get(...) {
> >   ...
> > }
> > == example ==
> >
> > Thanks,
> > Yuji
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: compaction falling behind

2017-02-13 Thread Ben Bromhead

You can do so in two ways:

1) direct observation:
You can keep an eye on the number of pending compactions. This will
fluctuate with load, compaction strategy, ongoing repairs and nodes
bootstrapping but generally the pattern is it should trend towards 0.

There have been a number of bugs in past versions of Cassandra whereby the
number of pending compactions is not reported correctly, so depending on
what version of Cassandra you run this could impact you.

2) Indirect observation
You can keep an eye on metrics that healthy compaction will directly
contribute to. These include the number of sstables per read histogram,
estimated droppable tombstones, tombstones per read etc. You should keep an
eye on these things anyway as they can often show you areas where you can
fine tune compaction or your data model.

Everything exposed by nodetool is consumable via JMX which is great to plug
into your metrics/monitoring/observability system :)

On Mon, 13 Feb 2017 at 13:23 John Sanda <john.sa...@gmail.com> wrote:

> What is a good way to determine whether or not compaction is falling
> behind? I read a couple things earlier that suggest nodetool
> compactionstats might not be the most reliable thing to use.
>
>
>
> - John
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Authentication with Java driver

2017-02-09 Thread Ben Bromhead

If the processes are launched separately or you fork before setting up the
cluster object it won't share credentials.

On Wed, Feb 8, 2017, 02:33 Yuji Ito <y...@imagine-orb.com> wrote:

> Thanks Ben,
>
> Do you mean lots of instances of the process or lots of instances of the
> cluster/session object?
>
>
> Lots of instances of the process are generated.
> I wanted to confirm that `other` doesn't authenticate.
>
> If I want to avoid that, my application has to create new cluster/session
> objects per instance.
> But it is inefficient and uncommon.
> So, we aren't sure that the application works when a lot of
> cluster/session objects are created.
> Is it correct?
>
> Thank you,
> Yuji
>
>
>
> On Wed, Feb 8, 2017 at 12:01 PM, Ben Bromhead <b...@instaclustr.com> wrote:
>
> On Tue, 7 Feb 2017 at 17:52 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Andrew, Ben,
>
> My application creates a lot of instances connecting to Cassandra with
> basically the same set of credentials.
>
> Do you mean lots of instances of the process or lots of instances of the
> cluster/session object?
>
>
> After an instance connects to Cassandra with the credentials, can any
> instance connect to Cassandra without credentials?
>
> As long as you don't share the session or cluster objects. Each new
> cluster/session will need to reauthenticate.
>
>
> == example ==
> A first = new A("database", "user", "password");  // proper credentials
> r = first.get();
> ...
> A other = new A("database", "user", "pass"); // wrong password
> r = other.get();
> == example ==
>
> I want to refuse the `other` instance with improper credentials.
>
>
> This looks like you are creating new cluster/session objects (filling in
> the blanks for your pseudocode here). So "other" will not authenticate to
> Cassandra.
>
> This brings up a wider point of why you are doing this? Generally most
> applications will create a single longed lived session object that lasts
> the life of the application process.
>
> I would not rely on Cassandra auth to authenticate downstream actors, not
> because it's bad, just its generally inefficient to create lots of session
> objects. The session object maintains a connection pool, pipelines
> requests, is thread safe and generally pretty solid.
>
>
>
>
> Yuji
>
>
> On Wed, Feb 8, 2017 at 4:11 AM, Ben Bromhead <b...@instaclustr.com> wrote:
>
> What are you specifically trying to achieve? Are you trying to
> authenticate multiple Cassandra users from a single application instance?
> Or will your have lot's of application instances connecting to Cassandra
> using the same set of credentials? Or a combination of both? Multiple
> application instances with different credentials?
>
> On Tue, 7 Feb 2017 at 06:19 Andrew Tolbert <andrew.tolb...@datastax.com>
> wrote:
>
> Hello,
>
> The API seems kind of not correct because credentials should be
> usually set with a session but actually they are set with a cluster.
>
>
> With the datastax driver, Session is what manages connection pools to
> each node.  Cluster manages configuration and a separate connection
> ('control connection') to subscribe to state changes (schema changes, node
> topology changes, node up/down events).
>
>
> So, if there are 1000 clients, then with this API it has to create
> 1000 cluster instances ?
>
>
> I'm unsure how common it is for per-user authentication to be done when
> connecting to the database.  I think an application would normally
> authenticate with one set of credentials instead of multiple.  The protocol
> Cassandra uses does authentication at the connection level instead of at
> the request level, so that is currently a limitation to support something
> like reusing Sessions for authenticating multiple users.
>
> Thanks,
> Andy
>
>
> On Tue, Feb 7, 2017 at 7:19 AM Hiroyuki Yamada <mogwa...@gmail.com> wrote:
>
> Hi,
>
> The API seems kind of not correct because credentials should be
> usually set with a session but actually they are set with a cluster.
>
> So, if there are 1000 clients, then with this API it has to create
> 1000 cluster instances ?
> 1000 clients seems usual if there are many nodes (say 20) and each
> node has some concurrency (say 50),
> but 1000 cluster instances seems too many.
>
> Is this an expected way to do this ? or
> Is there any way to authenticate per session ?
>
> Thanks,
> Hiro
>
> On Tue, Feb 7, 2017 at 11:38 AM, Yuji Ito <y...@imagine-orb.com> wrote:
> > Hi all,
> >
> > I want to know how to authenticate

Re: Cassandra Authentication

2017-01-18 Thread Ben Bromhead

We have a process that syncs and manages RF==N and we also control and
manage users, however that entails it's own set of challenges and
maintenance.

For most users I would suggest 3 < RF <=5 is sufficient. Also make sure you
don't use the user "Cassandra" in production as authentication queries are
done at QUORUM.

On Wed, 18 Jan 2017 at 13:41 Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Hello,
>
> When enabling Authentication on cassandra, is it required to set the RF
> same as the no.of nodes(
> https://docs.datastax.com/en/cql/3.1/cql/cql_using/update_ks_rf_t.html)?
> or can I live with RF of 3 in each DC (other KS are using 3)
>
> If it has to be equal to the number of nodes then, every time adding or
> removing a node requires update of RF.
>
> Thanks in advance.
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Cassandra Authentication

2017-01-18 Thread Ben Bromhead

the volume of data is pretty low + you still want to be able to
authenticate even if you have more nodes down than the RF for other
keyspaces. Essentially you don't want auth to be the thing that stops you
serving requests.

On Wed, 18 Jan 2017 at 14:57 Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Thanks Ben,
>
> RF 3 isn't sufficient for system_auth? as we are using 3 RF for other
> production KS, do you see any challenges?
>
> On Wed, Jan 18, 2017 at 2:39 PM, Ben Bromhead <b...@instaclustr.com> wrote:
>
> We have a process that syncs and manages RF==N and we also control and
> manage users, however that entails it's own set of challenges and
> maintenance.
>
> For most users I would suggest 3 < RF <=5 is sufficient. Also make sure
> you don't use the user "Cassandra" in production as authentication queries
> are done at QUORUM.
>
> On Wed, 18 Jan 2017 at 13:41 Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
> Hello,
>
> When enabling Authentication on cassandra, is it required to set the RF
> same as the no.of nodes(
> https://docs.datastax.com/en/cql/3.1/cql/cql_using/update_ks_rf_t.html)?
> or can I live with RF of 3 in each DC (other KS are using 3)
>
> If it has to be equal to the number of nodes then, every time adding or
> removing a node requires update of RF.
>
> Thanks in advance.
>
> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692 <+1%20650-284-9692>
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: How Fast Does Information Spread With Gossip?

2016-09-16 Thread Ben Bromhead

Gossip propagation is generally best modelled by epidemic algorithms.

Luckily for us Cassandra's gossip protocol is fairly simply.

Cassandra will perform one Gossip Task every second. Within each gossip
task it will randomly gossip with another available node in the cluster, it
will also possibly attempt to gossip with a down node (based on a random
chance that increases as the number of down nodes increases) and if it
hasn't gossiped with seed that round it may also attempt to gossip with a
defined seed. So Cassandra can do up to 3 rounds per second, however these
extra rounds are supposed to be optimizations for improving average case
convergence and recovering from split brain scenarios quicker than would
normally occur.

Assuming just one gossip round per second, for a new piece of information
to spread to all members of the cluster via gossip, you would see a worst
case performance of O(n) gossip rounds where n is the number of nodes in
the cluster. This is because each Cassandra node can gossip to any other
node irrespective of topology (a fully connected mesh).

There is some ongoing discussion about expanding gossip to utilise partial
views of the cluster and exchanging those, or using spanning/broadcast
trees to speed up convergence and reduce workload in large clusters (1000+)
nodes, see https://issues.apache.org/jira/browse/CASSANDRA-12345 for
details.

On Fri, 16 Sep 2016 at 01:01 Jens Rantil <jens.ran...@tink.se> wrote:

> > Is a minute a reasonable upper bound for most clusters?
>
> I have no numbers and I'm sure this differs depending on how large your
> cluster is. We have a small cluster of around 12 nodes and I statuses
> generally propagate in under 5 seconds for sure. So, it will definitely be
> less than 1 minute.
>
> Cheers,
> Jens
>
> On Wed, Sep 14, 2016 at 8:49 PM jerome <jeromefroel...@hotmail.com> wrote:
>
>> Hi,
>>
>>
>> I was curious if anyone had any kind of statistics or ballpark figures on
>> how long it takes information to propagate through a cluster with Gossip?
>> I'm particularly interested in how fast information about the liveness of a
>> node spreads. For example, in an n-node cluster the median amount of time
>> it takes for all nodes to learn that a node went down is f(n) seconds. Is a
>> minute a reasonable upper bound for most clusters? Too high, too low?
>>
>>
>> Thanks,
>>
>> Jerome
>>
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> For urgent matters you can reach me at +46-708-84 18 32.
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Handle Leap Seconds with Cassandra

2016-10-27 Thread Ben Bromhead

If you need guaranteed strict ordering in a distributed system, I would not
use Cassandra, Cassandra does not provide this out of the box. I would look
to a system that uses lamport or vector clocks. Based on your description
of how your systems runs at the moment (and how close your updates are
together), you have either already experienced out of order updates or
there is a real possibility you will in the future.

Sorry to be so dire, but if you do require causal consistency / strict
ordering, you are not getting it at the moment. Distributed systems theory
is really tricky, even for people that are "experts" on distributed systems
over unreliable networks (I would certainly not put myself in that
category). People have made a very good name for themselves by showing that
the vast majority of distributed databases have had bugs when it comes to
their various consistency models and the claims these databases make.

So make sure you really do need guaranteed causal consistency/strict
ordering or if you can design around it (e.g. using conflict free
replicated data types) or choose a system that is designed to provide it.

Having said that... here are some hacky things you could do in Cassandra to
try and get this behaviour, which I in no way endorse doing :)

   - Cassandra counters do leverage a logical clock per shard and you could
   hack something together with counters and lightweight transactions, but you
   would want to do your homework on counters accuracy during before diving
   into it... as I don't know if the implementation is safe in the context of
   your question. Also this would probably require a significant rework of
   your application plus a significant performance hit. I would invite a
   counter guru to jump in here...

   - You can leverage the fact that timestamps are monotonic if you isolate
   writes to a single node for a single shared... but you then loose
   Cassandra's availability guarantees, e.g. a keyspace with an RF of 1 and a
   CL of > ONE will get monotonic timestamps (if generated on the server
   side).

   - Continuing down the path of isolating writes to a single node for a
   given shard you could also isolate writes to the primary replica using your
   client driver during the leap second (make it a minute either side of the
   leap), but again you lose out on availability and you are probably already
   experiencing out of ordered writes given how close your writes and updates
   are.

A note on NTP: NTP is generally fine if you use it to keep the clocks
synced between the Cassandra nodes. If you are interested in how we have
implemented NTP at Instaclustr, see our blogpost on it
https://www.instaclustr.com/blog/2015/11/05/apache-cassandra-synchronization/
.

Ben

On Thu, 27 Oct 2016 at 10:18 Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:

> Hi Ben,
>
> Thanks for your reply. We dont use timestamps in primary key. We rely on
> server side timestamps generated by coordinator. So, no functions at
> client side would help.
>
> Yes, drifts can create problems too. But even if you ensure that nodes are
> perfectly synced with NTP, you will surely mess up the order of updates
> during the leap second(interleaving). Some applications update same column
> of same row quickly (within a second ) and reversing the order would
> corrupt the data.
>
> I am interested in learning how people relying on strict order of updates
> handle leap second scenario when clock goes back one second(same second is
> repeated). What kind of tricks people use  to ensure that server side
> timestamps are monotonic ?
>
> As per my understanding NTP slew mode may not be suitable for Cassandra as
> it may cause unpredictable drift amongst the Cassandra nodes. Ideas ??
>
>
> Thanks
> Anuj
>
>
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Thu, 20 Oct, 2016 at 11:25 PM, Ben Bromhead
>
> <b...@instaclustr.com> wrote:
> http://www.datastax.com/dev/blog/preparing-for-the-leap-second gives a
> pretty good overview
>
> If you are using a timestamp as part of your primary key, this is the
> situation where you could end up overwriting data. I would suggest using
> timeuuid instead which will ensure that you get different primary keys even
> for data inserted at the exact same timestamp.
>
> The blog post also suggests using certain monotonic timestamp classes in
> Java however these will not help you if you have multiple clients that may
> overwrite data.
>
> As for the interleaving or out of order problem, this is hard to address
> in Cassandra without resorting to external coordination or LWTs. If you are
> relying on a wall clock to guarantee order in a distributed system you will
> get yourself into trouble even without leap seconds (clock drift, NTP
> inaccur

Re: Are Cassandra writes are faster than reads?

2016-11-08 Thread Ben Bromhead

Awesome! For a full explanation of what you are seeing (we call it micro
batching) check out Adam Zegelins talk on it
https://www.youtube.com/watch?v=wF3Ec1rdWgc

On Tue, 8 Nov 2016 at 02:21 Rajesh Radhakrishnan <
rajesh.radhakrish...@phe.gov.uk> wrote:

>
> Hi,
>
> Just found that reducing the batch size below 20 also increases the
> writing speed and reduction in memory usage(especially for Python driver).
>
> Kind regards,
> Rajesh R
>
> ------
> *From:* Ben Bromhead [b...@instaclustr.com]
> *Sent:* 07 November 2016 05:44
> *To:* user@cassandra.apache.org
> *Subject:* Re: Are Cassandra writes are faster than reads?
>
> They can be and it depends on your compaction strategy :)
>
> On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <ali.rac...@gmail.com
> <http://redir.aspx?REF=KvuN_F91CkILmAKkPOD8RLOkpaObm4vWZ4CTx2PNAjG8Cvd6wAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>>
> wrote:
>
> tl;dr? I just want to know if updates are bad for performance, and if so,
> for how long.
>
> On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <b...@instaclustr.com
> <http://redir.aspx?REF=bOLz-2Z_cjZ-R5mW4ySFRmRgIvYoWF43pRrpxxUsOOC8Cvd6wAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>
> > wrote:
>
> Check out https://wiki.apache.org/cassandra/WritePathForUsers
> <http://redir.aspx?REF=z6gebtTM9Bi4b1ZEZqnpcgJOwnifCWloccEOX28F8UC8Cvd6wAfUCAFodHRwczovL3dpa2kuYXBhY2hlLm9yZy9jYXNzYW5kcmEvV3JpdGVQYXRoRm9yVXNlcnM.>
>  for
> the full gory details.
>
> On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac...@gmail.com
> <http://redir.aspx?REF=KvuN_F91CkILmAKkPOD8RLOkpaObm4vWZ4CTx2PNAjG8Cvd6wAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>>
> wrote:
>
> How long does it take for updates to get merged / compacted into the main
> data file?
>
> On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com
> <http://redir.aspx?REF=bOLz-2Z_cjZ-R5mW4ySFRmRgIvYoWF43pRrpxxUsOOC8Cvd6wAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>
> > wrote:
>
> To add some flavor as to how the commitlog implementation is so quick.
>
> It only flushes to disk every 10s by default. So writes are effectively
> done to memory and then to disk asynchronously later on. This is generally
> accepted to be OK, as the write is also going to other nodes.
>
> You can of course change this behavior to flush on each write or to skip
> the commitlog altogether (danger!). This however will change how "safe"
> things are from a durability perspective.
>
> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com
> <http://redir.aspx?REF=CSJmlUdwjTSoe3NQdZNlO6pFPeaI_KxNpZweB-GbDYO8Cvd6wAfUCAFtYWlsdG86amVmZi5qaXJzYUBjcm93ZHN0cmlrZS5jb20.>>
> wrote:
>
> Cassandra writes are particularly fast, for a few reasons:
>
>
>
> 1)   Most writes go to a commitlog (append-only file, written
> linearly, so particularly fast in terms of disk operations) and then pushed
> to the memTable. Memtable is flushed in batches to the permanent data
> files, so it buffers many mutations and then does a sequential write to
> persist that data to disk.
>
> 2)   Reads may have to merge data from many data tables on disk.
> Because the writes (described very briefly in step 1) write to immutable
> files, updates/deletes have to be merged on read – this is extra effort for
> the read path.
>
>
>
> If you don’t do much in terms of overwrites/deletes, and your partitions
> are particularly small, and your data fits in RAM (probably mmap/page cache
> of data files, unless you’re using the row cache), reads may be very fast
> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>
>
>
> -  Jeff
>
>
>
> *From: *Vikas Jaiman <er.vikasjai...@gmail.com
> <http://redir.aspx?REF=VgqqnBUEzP6sLWofnDxFp3iyHQ4TGCTJL8MbqH0NOUK8Cvd6wAfUCAFtYWlsdG86ZXIudmlrYXNqYWltYW5AZ21haWwuY29t>
> >
> *Reply-To: *"user@cassandra.apache.org
> <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>"
> <user@cassandra.apache.org
> <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>
> >
> *Date: *Sunday, November 6, 2016 at 12:42 PM
> *To: *"user@cassandra.apache.org
> <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>"
> <user@cassandra.apache.org
> <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>
> >
> *Subject: *Are Cassandra writes are faster than reads?
&

Re: Are Cassandra writes are faster than reads?

2016-11-06 Thread Ben Bromhead

Check out https://wiki.apache.org/cassandra/WritePathForUsers for the full
gory details.

On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac...@gmail.com> wrote:

> How long does it take for updates to get merged / compacted into the main
> data file?
>
> On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com> wrote:
>
> To add some flavor as to how the commitlog implementation is so quick.
>
> It only flushes to disk every 10s by default. So writes are effectively
> done to memory and then to disk asynchronously later on. This is generally
> accepted to be OK, as the write is also going to other nodes.
>
> You can of course change this behavior to flush on each write or to skip
> the commitlog altogether (danger!). This however will change how "safe"
> things are from a durability perspective.
>
> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
>
> Cassandra writes are particularly fast, for a few reasons:
>
>
>
> 1)   Most writes go to a commitlog (append-only file, written
> linearly, so particularly fast in terms of disk operations) and then pushed
> to the memTable. Memtable is flushed in batches to the permanent data
> files, so it buffers many mutations and then does a sequential write to
> persist that data to disk.
>
> 2)   Reads may have to merge data from many data tables on disk.
> Because the writes (described very briefly in step 1) write to immutable
> files, updates/deletes have to be merged on read – this is extra effort for
> the read path.
>
>
>
> If you don’t do much in terms of overwrites/deletes, and your partitions
> are particularly small, and your data fits in RAM (probably mmap/page cache
> of data files, unless you’re using the row cache), reads may be very fast
> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>
>
>
> -  Jeff
>
>
>
> *From: *Vikas Jaiman <er.vikasjai...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Sunday, November 6, 2016 at 12:42 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Are Cassandra writes are faster than reads?
>
>
>
> Hi all,
>
>
>
> Are Cassandra writes are faster than reads ?? If yes, why is this so? I am
> using consistency 1 and data is in memory.
>
>
>
> Vikas
>
> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Are Cassandra writes are faster than reads?

2016-11-06 Thread Ben Bromhead

They can be and it depends on your compaction strategy :)

On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <ali.rac...@gmail.com> wrote:

> tl;dr? I just want to know if updates are bad for performance, and if so,
> for how long.
>
> On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <b...@instaclustr.com> wrote:
>
> Check out https://wiki.apache.org/cassandra/WritePathForUsers for the
> full gory details.
>
> On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac...@gmail.com> wrote:
>
> How long does it take for updates to get merged / compacted into the main
> data file?
>
> On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <b...@instaclustr.com> wrote:
>
> To add some flavor as to how the commitlog implementation is so quick.
>
> It only flushes to disk every 10s by default. So writes are effectively
> done to memory and then to disk asynchronously later on. This is generally
> accepted to be OK, as the write is also going to other nodes.
>
> You can of course change this behavior to flush on each write or to skip
> the commitlog altogether (danger!). This however will change how "safe"
> things are from a durability perspective.
>
> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
>
> Cassandra writes are particularly fast, for a few reasons:
>
>
>
> 1)   Most writes go to a commitlog (append-only file, written
> linearly, so particularly fast in terms of disk operations) and then pushed
> to the memTable. Memtable is flushed in batches to the permanent data
> files, so it buffers many mutations and then does a sequential write to
> persist that data to disk.
>
> 2)   Reads may have to merge data from many data tables on disk.
> Because the writes (described very briefly in step 1) write to immutable
> files, updates/deletes have to be merged on read – this is extra effort for
> the read path.
>
>
>
> If you don’t do much in terms of overwrites/deletes, and your partitions
> are particularly small, and your data fits in RAM (probably mmap/page cache
> of data files, unless you’re using the row cache), reads may be very fast
> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>
>
>
> -  Jeff
>
>
>
> *From: *Vikas Jaiman <er.vikasjai...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Sunday, November 6, 2016 at 12:42 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Are Cassandra writes are faster than reads?
>
>
>
> Hi all,
>
>
>
> Are Cassandra writes are faster than reads ?? If yes, why is this so? I am
> using consistency 1 and data is in memory.
>
>
>
> Vikas
>
> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
>
> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Are Cassandra writes are faster than reads?

2016-11-06 Thread Ben Bromhead

To add some flavor as to how the commitlog implementation is so quick.

It only flushes to disk every 10s by default. So writes are effectively
done to memory and then to disk asynchronously later on. This is generally
accepted to be OK, as the write is also going to other nodes.

You can of course change this behavior to flush on each write or to skip
the commitlog altogether (danger!). This however will change how "safe"
things are from a durability perspective.

On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:

> Cassandra writes are particularly fast, for a few reasons:
>
>
>
> 1)   Most writes go to a commitlog (append-only file, written
> linearly, so particularly fast in terms of disk operations) and then pushed
> to the memTable. Memtable is flushed in batches to the permanent data
> files, so it buffers many mutations and then does a sequential write to
> persist that data to disk.
>
> 2)   Reads may have to merge data from many data tables on disk.
> Because the writes (described very briefly in step 1) write to immutable
> files, updates/deletes have to be merged on read – this is extra effort for
> the read path.
>
>
>
> If you don’t do much in terms of overwrites/deletes, and your partitions
> are particularly small, and your data fits in RAM (probably mmap/page cache
> of data files, unless you’re using the row cache), reads may be very fast
> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>
>
>
> -  Jeff
>
>
>
> *From: *Vikas Jaiman <er.vikasjai...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Sunday, November 6, 2016 at 12:42 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Are Cassandra writes are faster than reads?
>
>
>
> Hi all,
>
>
>
> Are Cassandra writes are faster than reads ?? If yes, why is this so? I am
> using consistency 1 and data is in memory.
>
>
>
> Vikas
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Handle Leap Seconds with Cassandra

2016-10-20 Thread Ben Bromhead

http://www.datastax.com/dev/blog/preparing-for-the-leap-second gives a
pretty good overview

If you are using a timestamp as part of your primary key, this is the
situation where you could end up overwriting data. I would suggest using
timeuuid instead which will ensure that you get different primary keys even
for data inserted at the exact same timestamp.

The blog post also suggests using certain monotonic timestamp classes in
Java however these will not help you if you have multiple clients that may
overwrite data.

As for the interleaving or out of order problem, this is hard to address in
Cassandra without resorting to external coordination or LWTs. If you are
relying on a wall clock to guarantee order in a distributed system you will
get yourself into trouble even without leap seconds (clock drift, NTP
inaccuracy etc).

On Thu, 20 Oct 2016 at 10:30 Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:

> Hi,
>
> I would like to know how you guys handle leap seconds with Cassandra.
>
> I am not bothered about the livelock issue as we are using appropriate
> versions of Linux and Java. I am more interested in finding an optimum
> answer for the following question:
>
> How do you handle wrong ordering of multiple writes (on same row and
> column) during the leap second? You may overwrite the new value with old
> one (disaster).
>
> And Downtime is no option :)
>
> I can see that CASSANDRA-9131 is still open..
>
> FYI..we are on 2.0.14 ..
>
>
> Thanks
> Anuj
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Introducing Cassandra 3.7 LTS

2016-10-20 Thread Ben Bromhead

Thanks Sankalp, we are also reviewing our internal 2.1 list against what
you published (though we are trying to upgrade everyone to later versions
e.g. 2.2). It's great to compare notes.

On Thu, 20 Oct 2016 at 16:19 sankalp kohli <kohlisank...@gmail.com> wrote:

> This is awesome. I have send out the patches which we back ported into 2.1
> on the dev list.
>
> On Wed, Oct 19, 2016 at 4:33 PM, kurt Greaves <k...@instaclustr.com>
> wrote:
>
>
> On 19 October 2016 at 21:07, sfesc...@gmail.com <sfesc...@gmail.com>
> wrote:
>
> Wow, thank you for doing this. This sentiment regarding stability seems to
> be widespread. Is the team reconsidering the whole tick-tock cadence? If
> not, I would add my voice to those asking that it is revisited.
>
>
> There has certainly been discussion regarding the tick-tock cadence, and
> it seems safe to say it will change. There hasn't been any official
> announcement yet, however.
>
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Introducing Cassandra 3.7 LTS

2016-10-19 Thread Ben Bromhead

Hi All

I am proud to announce we are making available our production build of
Cassandra 3.7 that we run at Instaclustr (both for ourselves and our
customers). Our release of Cassandra 3.7 includes a number of backported
patches from later versions of Cassandra e.g. 3.8 and 3.9 but doesn't
include the new features of these releases.

You can find our release of Cassandra 3.7 LTS on github here (
https://github.com/instaclustr/cassandra). You can read more of our
thinking and how this applies to our managed service here (
https://www.instaclustr.com/blog/2016/10/19/patched-cassandra-3-7/).

We also have an expanded FAQ about why and how we are approaching 3.x in
this manner (https://github.com/instaclustr/cassandra#cassandra-37-lts),
however I've included the top few question and answers below:

*Is this a fork?*
No, This is just Cassandra with a different release cadence for those who
want 3.x features but are slightly more risk averse than the current
schedule allows.

*Why not just use the official release?*
With the 3.x tick-tock branch we have encountered more instability than
with the previous release cadence. We feel that releasing new features
every other release makes it very hard for operators to stabilize their
production environment without bringing in brand new features that are not
battle tested. With the release of Cassandra 3.8 and 3.9 simultaneously the
bug fix branch included new and real-world untested features, specifically
CDC. We have decided to stick with Cassandra 3.7 and instead backport
critical issues and maintain it ourselves rather than trying to stick with
the current Apache Cassandra release cadence.

*Why backport?*
At Instaclustr we support and run a number of different versions of Apache
Cassandra on behalf of our customers. Over the course of managing Cassandra
for our customers we often encounter bugs. There are existing patches for
some of them, others we patch ourselves. Generally, if we can, we try to
wait for the next official Apache Cassandra release, however in the need to
ensure our customers remain stable and running we will sometimes backport
bugs and write our own hotfixes (which are also submitted back to the
community).

*Why release it?*
A number of our customers and people in the community have asked if we
would make this available, which we are more than happy to do so. This
repository represents what Instaclustr runs in production for Cassandra 3.7
and this is our way of helping the community get a similar level of
stability as what you would get from our managed service.

Cheers

Ben



-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: High system CPU during high write workload

2016-11-14 Thread Ben Bromhead

Hi Abhishek

The article with the futex bug description lists the solution, which is to
upgrade to a version of RHEL or CentOS that have the specified patch.

What help do you specifically need? If you need help upgrading the OS I
would look at the documentation for RHEL or CentOS.

Ben

On Mon, 14 Nov 2016 at 22:48 Abhishek Gupta <gupta.abhis...@snapdeal.com>
wrote:

Hi,

We are seeing an issue where the system CPU is shooting off to a figure or
> 90% when the cluster is subjected to a relatively high write workload i.e
4k wreq/secs.

2016-11-14T13:27:47.900+0530 Process summary
  process cpu=695.61%
  application cpu=676.11% (*user=200.63% sys=475.49%) **<== Very High
System CPU *
  other: cpu=19.49%
  heap allocation rate *403mb*/s
[000533] user= 1.43% sys= 6.91% alloc= 2216kb/s - SharedPool-Worker-129
[000274] user= 0.38% sys= 7.78% alloc= 2415kb/s - SharedPool-Worker-34
[000292] user= 1.24% sys= 6.77% alloc= 2196kb/s - SharedPool-Worker-56
[000487] user= 1.24% sys= 6.69% alloc= 2260kb/s - SharedPool-Worker-79
[000488] user= 1.24% sys= 6.56% alloc= 2064kb/s - SharedPool-Worker-78
[000258] user= 1.05% sys= 6.66% alloc= 2250kb/s - SharedPool-Worker-41

On doing strace it was found that the following system call is consuming
all the system CPU
 timeout 10s strace -f -p 5954 -c -q
% time seconds  usecs/call callserrors syscall
-- --- --- - - 

*88.33 1712.798399   16674102723 22191 futex* 3.98   77.098730
   4356 17700   read
 3.27   63.474795  394253   16129 restart_syscall
 3.23   62.601530   29768  2103   epoll_wait

On searching we found the following bug with the RHEL 6.6, CentOS 6.6
kernel seems to be a probable cause for the issue:

https://docs.datastax.com/en/landing_page/doc/landing_page/troubleshooting/cassandra/fetuxWaitBug.html

The patch fix mentioned in the doc is also not present in our kernel.

sudo rpm -q --changelog kernel-`uname -r` | grep futex | grep ref
- [kernel] futex_lock_pi() key refcnt fix (Danny Feng) [566347]
{CVE-2010-0623}

Can some who has faced and resolved this issue help us here.

Thanks,
Abhishek


-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Is it safe to issue multiple replace-node at the same time?

2016-11-21 Thread Ben Bromhead

Same rack and no range movements, my first instinct is to say yes it is
safe (I like to treat racks as one giant meta node). However I would want
to have a read through the replace code.

On Mon, Nov 21, 2016, 07:22 Dikang Gu <dikan...@gmail.com> wrote:

> Hi guys,
>
> Sometimes we need to replace multiple hosts in the same rack, is it safe
> to replace them in parallel, using the replace-node command?
>
> Will it cause any data inconsistency if we do so?
>
> Thanks
> Dikang.
>
> --
> Dikang
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Clarify Support for 2.2 on Download Page

2016-11-21 Thread Ben Bromhead

Hi Derek

You should subscribe and post this question to the Dev list, they will be
able to get you sorted quickly!

Normally you can edit documentation directly via github (e.g.
https://github.com/apache/cassandra/tree/trunk/doc/source), however the
download source appears to be outside the Cassandra repo.

Ben



On Wed, 16 Nov 2016 at 13:08 Derek Burdick <derek.burd...@gmail.com> wrote:

> Hi, is it possible to update the language on the Apache Cassandra Download
> page to reflect that version 2.2 will enter Critical Fix Only support after
> November 21st?
>
> The current language creates quite a bit of confusion in the community
> with how long 2.2 and 2.1 will receive fixes from the community.
>
> http://cassandra.apache.org/download/
>
> Specifically these three lines:
>
>- Apache Cassandra 3.0 is supported until May 2017. The latest release
>is 3.0.9
>
> <http://www.apache.org/dyn/closer.lua/cassandra/3.0.9/apache-cassandra-3.0.9-bin.tar.gz>
> (pgp
>
> <http://www.apache.org/dist/cassandra/3.0.9/apache-cassandra-3.0.9-bin.tar.gz.asc>
>, md5
>
> <http://www.apache.org/dist/cassandra/3.0.9/apache-cassandra-3.0.9-bin.tar.gz.md5>
> and sha1
>
> <http://www.apache.org/dist/cassandra/3.0.9/apache-cassandra-3.0.9-bin.tar.gz.sha1>),
>released on 2016-09-20.
>- Apache Cassandra 2.2 is supported until November 2016. The latest
>release is 2.2.8
>
> <http://www.apache.org/dyn/closer.lua/cassandra/2.2.8/apache-cassandra-2.2.8-bin.tar.gz>
> (pgp
>
> <http://www.apache.org/dist/cassandra/2.2.8/apache-cassandra-2.2.8-bin.tar.gz.asc>
>, md5
>
> <http://www.apache.org/dist/cassandra/2.2.8/apache-cassandra-2.2.8-bin.tar.gz.md5>
> and sha1
>
> <http://www.apache.org/dist/cassandra/2.2.8/apache-cassandra-2.2.8-bin.tar.gz.sha1>),
>released on 2016-09-28.
>- Apache Cassandra 2.1 is supported until November 2016 with critical
>fixes only. The latest release is 2.1.16
>
> <http://www.apache.org/dyn/closer.lua/cassandra/2.1.16/apache-cassandra-2.1.16-bin.tar.gz>
> (pgp
>
> <http://www.apache.org/dist/cassandra/2.1.16/apache-cassandra-2.1.16-bin.tar.gz.asc>
>, md5
>
> <http://www.apache.org/dist/cassandra/2.1.16/apache-cassandra-2.1.16-bin.tar.gz.md5>
>     and sha1
>
> <http://www.apache.org/dist/cassandra/2.1.16/apache-cassandra-2.1.16-bin.tar.gz.sha1>),
>released on 2016-10-10.
>
>
> What would be the best approach to help get this changed?
>
> -Derek
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Any Bulk Load on Large Data Set Advice?

2016-11-17 Thread Ben Bromhead

+1 on parquet and S3.

Combined with spark running on spot instances your grant money will go much
further!

On Thu, 17 Nov 2016 at 07:21 Jonathan Haddad <j...@jonhaddad.com> wrote:

> If you're only doing this for spark, you'll be much better off using
> parquet and HDFS or S3. While you *can* do analytics with cassandra, it's
> not all that great at it.
> On Thu, Nov 17, 2016 at 6:05 AM Joe Olson <technol...@nododos.com> wrote:
>
> I received a grant to do some analysis on netflow data (Local IP address,
> Local Port, Remote IP address, Remote Port, time, # of packets, etc) using
> Cassandra and Spark. The de-normalized data set is about 13TB out the door.
> I plan on using 9 Cassandra nodes (replication factor=3) to store the data,
> with Spark doing the aggregation.
>
> Data set will be immutable once loaded, and am using the replication
> factor = 3 to somewhat simulate the real world. Most of the analysis will
> be of the sort "Give me all the remote ip addresses for source IP 'X'
> between time t1 and t2"
>
> I built and tested a bulk loader following this example in GitHub:
> https://github.com/yukim/cassandra-bulkload-example to generate the
> SSTables, but I have not executed it on the entire data set yet.
>
> Any advice on how to execute the bulk load under this configuration?  Can
> I generate the SSTables in parallel? Once generated, can I write the
> SSTables to all nodes simultaneously? Should I be doing any kind of sorting
> by the partition key?
>
> This is a lot of data, so I figured I'd ask before I pulled the trigger.
> Thanks in advance!
>
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Priority for cassandra nodes in cluster

2016-11-12 Thread Ben Bromhead

+1 w/ Benjamin.

However if you wish to make use of spare hardware capacity, look to
something like mesos DC/OS or kubernetes. You can run multiple services
across a fleet of hardware, but provision equal resources to Cassandra and
have somewhat reliable hardware sharing mechanisms.

On Sat, 12 Nov 2016 at 14:12 Jon Haddad <jonathan.had...@gmail.com> wrote:

> Agreed w/ Benjamin.  Trying to diagnose issues in prod will be a
> nightmare.  Keep your DB servers homogeneous.
>
> On Nov 12, 2016, at 1:52 PM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
> 1. From a 15 year experience of running distributed Services: dont Mix
> Services on machines if you don't have to. Dedicate each server to a single
> task if you can afford it. It is easier to manage and reduces risks in case
> of overload or failure
> 2. You can assign a different number of tokens for each node by setting
> this in Cassandra.yaml before you bootstrap that node
>
> Am 12.11.2016 22:48 schrieb "sat" <sathish.al...@gmail.com>:
>
> Hi,
>
> We are planning to install 3 node cluster in production environment. Is it
> possible to provide weightage or priority to the nodes in cluster.
>
> Eg., We want more more records to be written to first 2 nodes and less to
> the 3rd node. We are thinking of this approach because we want to install
> other IO intensive messaging server in the 3rd node, in order to reduce the
> load we are requesting for this approach.
>
>
> Thanks and Regards
> A.SathishKumar
>
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Introducing Cassandra 3.7 LTS

2016-11-02 Thread Ben Bromhead

We are not publishing the build artefacts for our LTS at the moment as we
don't test them on the different distros (debian/ubuntu, centos etc). If
anyone wishes to do so feel free to create a PR and submit them!

On Wed, 2 Nov 2016 at 11:37 Jesse Hodges <hodges.je...@gmail.com> wrote:

> awesome, thanks for the tip!
>
> -Jesse
>
> On Wed, Nov 2, 2016 at 12:39 PM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
> You can build one on your own very easily. Just check out the desired git
> repo and do this:
>
>
> http://stackoverflow.com/questions/8989192/how-to-package-the-cassandra-source-code-into-debian-package
>
> 2016-11-02 17:35 GMT+01:00 Jesse Hodges <hodges.je...@gmail.com>:
>
> Just curious, has anybody created a debian package for this?
>
> Thanks, Jesse
>
> On Sat, Oct 22, 2016 at 7:45 PM, Kai Wang <dep...@gmail.com> wrote:
>
> This is awesome! Stability is the king.
>
> Thank you so much!
>
> On Oct 19, 2016 2:56 PM, "Ben Bromhead" <b...@instaclustr.com> wrote:
>
> Hi All
>
> I am proud to announce we are making available our production build of
> Cassandra 3.7 that we run at Instaclustr (both for ourselves and our
> customers). Our release of Cassandra 3.7 includes a number of backported
> patches from later versions of Cassandra e.g. 3.8 and 3.9 but doesn't
> include the new features of these releases.
>
> You can find our release of Cassandra 3.7 LTS on github here (
> https://github.com/instaclustr/cassandra). You can read more of our
> thinking and how this applies to our managed service here (
> https://www.instaclustr.com/blog/2016/10/19/patched-cassandra-3-7/).
>
> We also have an expanded FAQ about why and how we are approaching 3.x in
> this manner (https://github.com/instaclustr/cassandra#cassandra-37-lts),
> however I've included the top few question and answers below:
>
> *Is this a fork?*
> No, This is just Cassandra with a different release cadence for those who
> want 3.x features but are slightly more risk averse than the current
> schedule allows.
>
> *Why not just use the official release?*
> With the 3.x tick-tock branch we have encountered more instability than
> with the previous release cadence. We feel that releasing new features
> every other release makes it very hard for operators to stabilize their
> production environment without bringing in brand new features that are not
> battle tested. With the release of Cassandra 3.8 and 3.9 simultaneously the
> bug fix branch included new and real-world untested features, specifically
> CDC. We have decided to stick with Cassandra 3.7 and instead backport
> critical issues and maintain it ourselves rather than trying to stick with
> the current Apache Cassandra release cadence.
>
> *Why backport?*
> At Instaclustr we support and run a number of different versions of Apache
> Cassandra on behalf of our customers. Over the course of managing Cassandra
> for our customers we often encounter bugs. There are existing patches for
> some of them, others we patch ourselves. Generally, if we can, we try to
> wait for the next official Apache Cassandra release, however in the need to
> ensure our customers remain stable and running we will sometimes backport
> bugs and write our own hotfixes (which are also submitted back to the
> community).
>
> *Why release it?*
> A number of our customers and people in the community have asked if we
> would make this available, which we are more than happy to do so. This
> repository represents what Instaclustr runs in production for Cassandra 3.7
> and this is our way of helping the community get a similar level of
> stability as what you would get from our managed service.
>
> Cheers
>
> Ben
>
>
>
> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Handle Leap Seconds with Cassandra

2016-11-02 Thread Ben Bromhead

Based on most of what I've said previously pretty much most ways of
avoiding your ordering issue of the leap second is going to be a "hack" and
there will be some amount of hope involved.

If the updates occur more than 300ms apart and you are confident your nodes
have clocks that are within 150ms of each other, then I'd close my eyes and
hope they all leap second at the same time within that 150ms.

If they are less then 300ms (I'm guessing you meant less 300ms), then I
would look to figure out what the smallest gap is between those two updates
and make sure your nodes clocks are close enough in that gap that the leap
second will occur on all nodes within that gap.

If that's not good enough, you could just halt those scenarios for 2
seconds over the leap second and then resume them once you've confirmed all
clocks have skipped.


On Wed, 2 Nov 2016 at 18:13 Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:

> Thanks Ben for taking out time for the detailed reply !!
>
> We dont need strict ordering for all operations but we are looking for
> scenarios where 2 quick updates to same column of same row are possible. By
> quick updates, I mean >300 ms. Configuring NTP properly (as mentioned in
> some blogs in your link) should give fair relative accuracy between the
> Cassandra nodes. But leap second takes the clock back for an ENTIRE one
> sec (huge) and the probability of old write overwriting the new one
> increases drastically. So, we want to be proactive with things.
>
> I agree that you should avoid such scebaruos with design (if possible).
>
> Good to know that you guys have setup your own NTP servers as per the
> recommendation. Curious..Do you also do some monitoring around NTP?
>
>
>
> Thanks
> Anuj
>
> On Fri, 28 Oct, 2016 at 12:25 AM, Ben Bromhead
>
> <b...@instaclustr.com> wrote:
> If you need guaranteed strict ordering in a distributed system, I would
> not use Cassandra, Cassandra does not provide this out of the box. I would
> look to a system that uses lamport or vector clocks. Based on your
> description of how your systems runs at the moment (and how close your
> updates are together), you have either already experienced out of order
> updates or there is a real possibility you will in the future.
>
> Sorry to be so dire, but if you do require causal consistency / strict
> ordering, you are not getting it at the moment. Distributed systems theory
> is really tricky, even for people that are "experts" on distributed systems
> over unreliable networks (I would certainly not put myself in that
> category). People have made a very good name for themselves by showing that
> the vast majority of distributed databases have had bugs when it comes to
> their various consistency models and the claims these databases make.
>
> So make sure you really do need guaranteed causal consistency/strict
> ordering or if you can design around it (e.g. using conflict free
> replicated data types) or choose a system that is designed to provide it.
>
> Having said that... here are some hacky things you could do in Cassandra
> to try and get this behaviour, which I in no way endorse doing :)
>
>- Cassandra counters do leverage a logical clock per shard and you
>could hack something together with counters and lightweight transactions,
>but you would want to do your homework on counters accuracy during before
>diving into it... as I don't know if the implementation is safe in the
>context of your question. Also this would probably require a significant
>rework of your application plus a significant performance hit. I would
>invite a counter guru to jump in here...
>
>
>- You can leverage the fact that timestamps are monotonic if you
>isolate writes to a single node for a single shared... but you then loose
>Cassandra's availability guarantees, e.g. a keyspace with an RF of 1 and a
>CL of > ONE will get monotonic timestamps (if generated on the server
>side).
>
>
>- Continuing down the path of isolating writes to a single node for a
>given shard you could also isolate writes to the primary replica using your
>client driver during the leap second (make it a minute either side of the
>leap), but again you lose out on availability and you are probably already
>experiencing out of ordered writes given how close your writes and updates
>are.
>
>
> A note on NTP: NTP is generally fine if you use it to keep the clocks
> synced between the Cassandra nodes. If you are interested in how we have
> implemented NTP at Instaclustr, see our blogpost on it
> https://www.instaclustr.com/blog/2015/11/05/apache-cassandra-synchronization/
> .
>
>
>
> Ben
>
>
> On Thu, 27 Oct 2016 at 10:18 Anuj W

Re: Is there any way to throttle the memtable flushing throughput?

2016-10-11 Thread Ben Bromhead

A few thoughts on the larger problem at hand.

The AWS instance type you are using is not appropriate for a production
workload. Also with memtable flushes that cause spiky write throughput it
sounds like your commitlog is on the same disk as your data directory,
combined with the use of non-SSD EBS I'm not surprised this is happening.
The small amount of memory on the node could also mean your flush writers
are getting backed up (blocked), possibly causing JVM heap pressure and
other fun things (you can check this with nodetool tpstats).

Before you get into tuning memtable flushing I would do the following:

   - Reset your commitlog_sync settings back to default
   - Use an EC2 instance type with at least 15GB of memory, 4 cores and is
   EBS optimized (dedicated EBS bandwidth)
   - Use gp2 or io2 EBS volumes
   - Put your commitlog on a separate EBS volume.
   - Make sure your memtable_flush_writers are not being blocked, if so
   increase the number of flush writers (no more than # of cores)
   - Optimize your read_ahead_kb size and compression_chunk_length to keep
   those EBS reads as small as possible.

Once you have fixed the above, memtable flushing should not be an issue.
Even if you can't/don't want to upgrade the instance type, the other steps
will help things.

Ben

On Tue, 11 Oct 2016 at 10:23 Satoshi Hikida <sahik...@gmail.com> wrote:

> Hi,
>
> I'm investigating the read/write performance of the C* (Ver. 2.2.8).
> However, I have an issue about memtable flushing which forces the spiky
> write throughput. And then it affects the latency of the client's requests.
>
> So I want to know the answers for the following questions.
>
> 1. Is there any way that throttling the write throughput of the memtable
> flushing? If it exists, how can I do that?
> 2. Is there any way to reduce the spike of the write bandwidth during the
> memtable flushing?
>(I'm in trouble because the delay of the request increases when the
> spike of the write bandwidth occurred)
>
> I'm using one C* node for this investigation. And C* runs on an EC2
> instance (2vCPU, 4GB memory), In addition, I attach two magnetic disks to
> the instance, one stores system data(root file system.(/)), the other
> stores C* data (data files and commit logs).
>
> I also changed a few configurations.
> - commitlog_sync: batch
> - commitlog_sync_batch_window_in_ms: 2
> (Using default value for the other configurations)
>
>
> Regards,
> Satoshi
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Does increment/decrement by 0 generate any commits ?

2016-10-13 Thread Ben Bromhead

According to https://issues.apache.org/jira/browse/CASSANDRA-7304 unset
values in a prepared statement for a counter does not change the value of
the counter. This applies for versions of Cassandra 2.2 and above.

I would also look to verify the claimed behaviour myself.

On Tue, 11 Oct 2016 at 09:49 Dorian Hoxha <dorian.ho...@gmail.com> wrote:

> I just have a bunch of counters in 1 row, and I want to selectively update
> them. And I want to keep prepared queries. But I don't want to keep 30
> prepared queries (1 for each counter column, but keep only 1). So in most
> cases, I will increment 1 column by positive integer and the others by 0.
>
> Makes sense ?
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Adding disk capacity to a running node

2016-10-17 Thread Ben Bromhead

ne.com>
> wrote:
>
>
>
> Yes, Cassandra should keep percent of disk usage equal for all disk.
> Compaction process and SSTable flushes will use new disk to distribute both
> new and existing data.
>
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__winguzone.com-3Ffrom-3Dlist=DQMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=ixOxpX-xpw1dJZNpaMT3mepToWX8gzmsVaXFizQLzoU=4q7P9fddEYpXwPR-h9yA_tk5JwR8l6c7cKJ-LQTVcGM=>
> - Hosted Cloud Cassandra on Azure and SoftLayer.Launch your cluster in
> minutes.*
>
>
>
>
>
>  On Mon, 17 Oct 2016 11:43:27 -0400*Seth Edwards <s...@pubnub.com
> <s...@pubnub.com>>* wrote 
>
>
>
> We have a few nodes that are running out of disk capacity at the moment
> and instead of adding more nodes to the cluster, we would like to add
> another disk to the server and add it to the list of data directories. My
> question, is, will Cassandra use the new disk for compactions on sstables
> that already exist in the primary directory?
>
>
>
>
>
>
>
> Thanks!
>
>
>
>
>
>
> 
> CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and
> may be legally privileged. If you are not the intended recipient, do not
> disclose, copy, distribute, or use this email or any attachments. If you
> have received this in error please let the sender know and then delete the
> email and all attachments.
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Adding disk capacity to a running node

2016-10-17 Thread Ben Bromhead

yup you would need to copy the files across to the new volume from the dir
you wanted to give additional space to. Rough steps would look like:

   1. Create EBS volume (make it big... like 3TB)
   2. Attach to instance
   3. Mount/format EBS volume
   4. Stop C*
   5. Copy full/troublesome directory to the EBS volume
   6. Remove copied files (using rsync for the copy / remove step can be a
   good idea)
   7. bind mount EBS volume with the same path as the troublesome directory
   8. Start C* back up
   9. Let it finish compacting / streaming etc
   10. Stop C*
   11. remove bind mount
   12. copy files back to ephemeral
   13. start C* back up
   14. repeat on other nodes
   15. run repair

You can use this process if you somehow end up in a full disk situation. If
you end up in a low disk situation you'll have other issues (like corrupt /
half written SSTable components), but it's better than nothing

Also to maintain your read throughput during this whole thing, double check
the EBS volumes read_ahead_kb setting on the block volume and reduce it to
something sane like 0 or 16.



On Mon, 17 Oct 2016 at 13:42 Seth Edwards <s...@pubnub.com> wrote:

> @Ben
>
> Interesting idea, is this also an option for situations where the disk is
> completely full and Cassandra has stopped? (Not that I want to go there).
>
> If this was the route taken, and we did
>
> mount --bind   /mnt/path/to/large/sstable   /mnt/newebs
>
> We would still need to do some manual copying of files? such as
>
> mv /mnt/path/to/large/sstable.sd /mnt/newebs ?
>
> Thanks!
>
> On Mon, Oct 17, 2016 at 12:59 PM, Ben Bromhead <b...@instaclustr.com>
> wrote:
>
> Yup as everyone has mentioned ephemeral are fine if you run in multiple
> AZs... which is pretty much mandatory for any production deployment in AWS
> (and other cloud providers) . i2.2xls are generally your best bet for high
> read throughput applications on AWS.
>
> Also on AWS ephemeral storage will generally survive a user initiated
> restart. For the times that AWS retires an instance, you get plenty of
> notice and it's generally pretty rare. We run over 1000 instances on AWS
> and see one forced retirement a month if that. We've never had an instance
> pulled from under our feet without warning.
>
> To add another option for the original question, one thing you can do is
> to attach a large EBS drive to the instance and bind mount it to the
> directory for the table that has the very large SSTables. You will need to
> copy data across to the EBS volume. Let everything compact and then copy
> everything back and detach EBS. Latency may be higher than normal on the
> node you are doing this on (especially if you are used to i2.2xl
> performance).
>
> This is something we often have to do, when we encounter pathological
> compaction situations associated with bootstrapping, adding new DCs or STCS
> with a dominant table or people ignore high disk usage warnings :)
>
> On Mon, 17 Oct 2016 at 12:43 Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
> Ephemeral is fine, you just need to have enough replicas (in enough AZs
> and enough regions) to tolerate instances being terminated.
>
>
>
>
>
>
>
> *From: *Vladimir Yudovin <vla...@winguzone.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Monday, October 17, 2016 at 11:48 AM
> *To: *user <user@cassandra.apache.org>
>
>
> *Subject: *Re: Adding disk capacity to a running node
>
>
>
> It's extremely unreliable to use ephemeral (local) disks. Even if you
> don't stop instance by yourself, it can be restarted on different server in
> case of some hardware failure or AWS initiated update. So all node data
> will be lost.
>
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__winguzone.com-3Ffrom-3Dlist=DQMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=ixOxpX-xpw1dJZNpaMT3mepToWX8gzmsVaXFizQLzoU=4q7P9fddEYpXwPR-h9yA_tk5JwR8l6c7cKJ-LQTVcGM=>
> - Hosted Cloud Cassandra on Azure and SoftLayer.Launch your cluster in
> minutes.*
>
>
>
>
>
>  On Mon, 17 Oct 2016 14:45:00 -0400*Seth Edwards <s...@pubnub.com
> <s...@pubnub.com>>* wrote 
>
>
>
> These are i2.2xlarge instances so the disks currently configured as
> ephemeral dedicated disks.
>
>
>
> On Mon, Oct 17, 2016 at 11:34 AM, Laing, Michael <
> michael.la...@nytimes.com> wrote:
>
>
>
> You could just expand the size of your ebs volume and extend the file
> system. No data is lost - assuming you are running Linux.
>
>
>
>
>
> On Monday, October 17, 2016, Seth Edwards

Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Ben Bromhead

>
>
>
> I will note that Ben seems to suggest keeping the return of now() unique
> across
> call while keeping the time component equals, thus varying the rest of the
> uuid
> bytes. However:
>  - I'm starting to wonder what this would buy us. Why would someone be
> super
>confused by the time changing across calls (in a single
> statement/batch), but
>be totally not confused by the actual full return to not be equal?
>
Given that a common way of interacting with timeuuids is with toTimestamp I
can see the confusion and assumptions around behaviour.

And how is
>that actually useful: you're having different result anyway and you're
>letting the server pick the timestamp in the first place, so you're
> probably
>not caring about milliseconds precision of that timestamp in the first
> place.
>
If you want consistency of timestamps within your query as OP did I can see
how this is useful. Postgres claims this is a "feature".

 - This would basically be a violation of the timeuuid spec
>

Not quite... Type 1 uuids let you swap out the low 47 bits of the node
component with other randomly generated bits (
https://www.ietf.org/rfc/rfc4122.txt)

 - This would be a big pain in the code and make of now() a special case
> among functions. I'm unconvinced special cases are making things easier
> in general.
>

On reflection, I have to agree here, now() has been around for ever and
this is the first anecdote I've seen of someone getting caught out.

However with my user advocate hat on I think it would be worth
investigating further beyond a documentation update if others found it a
sticking point in Cassandra adoption.


> So I'm all for improving the documentation if this confuses users due to
> expectations (mistakenly) carried from prior experiences, and please
> feel free to open a JIRA for that. I'm a lot less in agreement that there
> is
> something wrong with the way the function behave in principle.
>


> > I can see why this issue has been largely ignored and hasn't had a
> chance for
> > the behaviour to be formally defined
>
> Don't make too much assumptions. The behavior is perfectly well defined:
> now()
> is a "normal" function and is evaluated whenever it's called according to
> the
> timeuuid spec (or as close to it as we can make it).
>
Maybe formally defined is the wrong term... Formally documented?

>
> On Thu, Dec 1, 2016 at 7:25 AM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
> Great comment. +1
>
> Am 01.12.2016 06:29 schrieb "Ben Bromhead" <b...@instaclustr.com>:
>
> tl;dr +1 yup raise a jira to discuss how now() should behave in a single
> statement (and possible extend to batch statements).
>
> The values of now should be the same if you assume that now() works like
> it does in relational databases such as postgres or mysql, however at the
> moment it instead works like sysdate() in mysql. Given that CQL is supposed
> to be SQL like, I think the assumption around the behaviour of now() was a
> fair one to make.
>
> I definitely agree that raising a jira ticket would be a great place to
> discuss what the behaviour of now() should be for Cassandra. Personally I
> would be in favour of seeing the deterministic component (the actual time
> part) being the same across multiple calls in the one statement or multiple
> statements in a batch.
>
> Cassandra documentation does not make any claims as to how now() works
> within a single statement and reading the code it shows the intent is to
> work like sysdate() from MySQL rather than now(). One of the identified
> dangers of making cql similar to sql is that, while yes it aids adoption,
> users will find that SQL like things don't behave as expected. Of course as
> a user, one shouldn't have to read the source code to determine correct
> behaviour.
>
> Given that a timeuuid is made up of deterministic and (pseudo)
> non-deterministic components I can see why this issue has been largely
> ignored and hasn't had a chance for the behaviour to be formally defined
> (you would expect now to return the same time in the one statement despite
> multiple calls, but you wouldn't expect the same behaviour for say a call
> to rand()).
>
>
>
>
>
>
>
> On Wed, 30 Nov 2016 at 19:54 Cody Yancey <yan...@uber.com> wrote:
>
> This is not a bug, and in fact changing it would be a serious bug.
>
> False. Absolutely no consumer would be broken by a change to guarantee an
> identical time component that isn't broken already, for the simple reason
> your code already has to handle that case, as it is in fact the majority
> case RIGHT NOW. Users can hit this bug, in production, because unit tests
> might

1 2 >

1 - 100 of 124 matches

Mail list logo