from:"Jonathan Haddad"

Re: schema change management tools

2012-10-04 Thread Jonathan Haddad

Not that I know of.  I've always been really strict about dumping my
schemas (to start) and keeping my changes in migration files.  I don't do a
ton of schema changes so I haven't had a need to really automate it.

Even with MySQL I never bothered.

Jon

On Thu, Oct 4, 2012 at 6:27 PM, John Sanda john.sa...@gmail.com wrote:

 I have been looking to see if there are any schema change management tools
 for Cassandra. I have not come across any so far. I figured I would check
 to see if anyone can point me to something before I start trying to
 implement something on my own. I have used liquibase (
 http://www.liquibase.org) for relational databases. Earlier today I tried
 using it with the cassandra-jdbc driver, but ran into some exceptions due
 to the SQL generated. I am not looking specifically for something
 CQL-based. Something that uses the Thrift API via CLI scripts for example
 would work as well.

 Thanks

 - John




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: schema change management tools

2012-10-04 Thread Jonathan Haddad

Awesome - keep me posted.

Jon

On Thu, Oct 4, 2012 at 6:42 PM, John Sanda john.sa...@gmail.com wrote:

 For the project I work on and for previous projects as well that support
 multiple upgrade paths, this kind of tooling is a necessity. And I would
 prefer to avoid duplicating effort if there is already something out there.
 If not though, I will be sure to post back to the list with whatever I wind
 up doing.


 On Thu, Oct 4, 2012 at 9:34 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Not that I know of.  I've always been really strict about dumping my
 schemas (to start) and keeping my changes in migration files.  I don't do a
 ton of schema changes so I haven't had a need to really automate it.

 Even with MySQL I never bothered.

 Jon


 On Thu, Oct 4, 2012 at 6:27 PM, John Sanda john.sa...@gmail.com wrote:

 I have been looking to see if there are any schema change management
 tools for Cassandra. I have not come across any so far. I figured I would
 check to see if anyone can point me to something before I start trying to
 implement something on my own. I have used liquibase (
 http://www.liquibase.org) for relational databases. Earlier today I
 tried using it with the cassandra-jdbc driver, but ran into some exceptions
 due to the SQL generated. I am not looking specifically for something
 CQL-based. Something that uses the Thrift API via CLI scripts for example
 would work as well.

 Thanks

 - John




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Migrating data from 2 node cluster to a 3 node cluster

2013-07-04 Thread Jonathan Haddad

You should run a nodetool repair after you copy the data over.  You could
also use the sstable loader, which would stream the data to the proper node.


On Thu, Jul 4, 2013 at 10:03 AM, srmore comom...@gmail.com wrote:

 We are planning to move data from a 2 node cluster to a 3 node cluster. We
 are planning to copy the data from the two nodes (snapshot) to the new 2
 nodes and hoping that Cassandra will sync it to the third node. Will this
 work ? are there any other commands to run after we are done migrating,
 like nodetool repair.

 Thanks all.




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: too many open files

2013-07-14 Thread Jonathan Haddad

Are you using leveled compaction?  If so, what do you have the file size
set at?  If you're using the defaults, you'll have a ton of really small
files.  I believe Albert Tobey recommended using 256MB for the
table sstable_size_in_mb to avoid this problem.


On Sun, Jul 14, 2013 at 5:10 PM, Paul Ingalls paulinga...@gmail.com wrote:

 I'm running into a problem where instances of my cluster are hitting over
 450K open files.  Is this normal for a 4 node 1.2.6 cluster with
 replication factor of 3 and about 50GB of data on each node?  I can push
 the file descriptor limit up, but I plan on having a much larger load so
 I'm wondering if I should be looking at something else….

 Let me know if you need more info…

 Paul





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: CPU Bound Writes

2013-07-20 Thread Jonathan Haddad

Everything is written to the commit log. In the case of a crash, cassandra 
recovers by replaying the log.

On Sat, Jul 20, 2013 at 9:03 AM, Mohammad Hajjat haj...@purdue.edu
wrote:

 Patricia,
 Thanks for the info. So are you saying that the *whole* data is being
 written on disk in the commit log, not just some sort of a summary/digest?
 I'm writing 10MB objects and I'm noticing high latency (250 milliseconds
 even with ANY consistency), so I guess that explains my high delays?
 Thanks,
 Mohammad
 On Fri, Jul 19, 2013 at 2:17 PM, Patricia Gorla 
 gorla.patri...@gmail.comwrote:
 Kanwar,

 This is because writes are appends to the commit log, which is stored on
 disk, not memory. The commit log is then flushed to the memtable (in
 memory), before being written to an sstable on disk.

 So, most of the actions in sending out a write are writing to disk.

 Also see: http://www.datastax.com/docs/1.2/dml/about_writes

 Patricia


 On Fri, Jul 19, 2013 at 1:05 PM, Kanwar Sangha kan...@mavenir.com wrote:

  “Insert-heavy workloads will actually be CPU-bound in Cassandra before
 being memory-bound”

 ** **

 Can someone explain why the internals of why writes are CPU bound ?

 ** **

 ** **



 -- 
 *Mohammad Hajjat*
 *Ph.D. Student*
 *Electrical and Computer Engineering*
 *Purdue University*

Re: VM dimensions for running Cassandra and Hadoop

2013-07-31 Thread Jonathan Haddad

Having just enough RAM to hold the JVM's heap generally isn't a good idea
unless you're not planning on doing much with the machine.

Any memory not allocated to a process will generally be put to good use
serving as page cache. See here: http://en.wikipedia.org/wiki/Page_cache

Jon


On Tue, Jul 30, 2013 at 10:51 PM, Jan Algermissen 
jan.algermis...@nordsc.com wrote:

 Hi,

 thanks for the helpful replies last week.

 It looks as if I will deploy Cassandra on a bunch of VMs and I am now in
 the process of understanding what the dimensions of the VMS should be.

 So far, I understand the following:

 - I need at least 3 VMs for a minimal Cassandra setup
 - I should get another VM to run the Hadoop job controller or
   can that run on one of the Cassandra VMs
 - there is no point in giving the Cassandra JVMs more than
   8-12 GB heap space because of GC, so it seems going beyond 16GB
   RAM per VM makes no sense
 - Each VM needs two disks, to separate commit log from data storage
 - I must make sure the disks are directly attached, to prevent
   problems when multiple nodes flush the commit log at the
   same time
 - I'll be having rather few writes and intend to hold most of the
   data in memory, so spinning disks are fine for the moment

 Does that seem reasonable?

 How should I plan the disk sizes and number of CPU cores?

 Are there any other configuration mistakes to avoid?

 Is there online documentation that discusses such VM sizing questions in
 more detail?

 Jan








-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: CQL and undefined columns

2013-07-31 Thread Jonathan Haddad

It's advised you do not use compact storage, as it's primarily for
backwards compatibility.

The first of these option is COMPACT STORAGE. This option is meanly
targeted towards backward compatibility with some table definition created
before CQL3. But it also provides a slightly more compact layout of data on
disk, though at the price of flexibility and extensibility, and for that
reason is not recommended unless for the backward compatibility reason.



On Wed, Jul 31, 2013 at 2:54 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 You should also profile what your data looks like on disk before picking a
 format. It may not be as efficient to use one form or the other due to
 extra disk overhead.


 On Wed, Jul 31, 2013 at 1:32 PM, Jon Ribbens 
 jon-cassan...@unequivocal.co.uk wrote:

 On Wed, Jul 31, 2013 at 02:21:52PM +0200, Alain RODRIGUEZ wrote:
 I like to point to this article from Sylvain, which is really well
 written.
 http://www.datastax.com/dev/blog/thrift-to-cql3

 Ah, thankyou, it looks like a combination of multi-column PRIMARY KEY
 and use of collections may well suffice for what I want. I must admit
 that I did not find any of this particularly obvious from the CQL
 documentation.

 By the way, http://cassandra.apache.org/doc/cql3/CQL.html#createTableStmt
 says A table with COMPACT STORAGE must also define at least one
 clustering key, which seems to contradict definition 2 in the
 thrift-to-cql3 document you pointed me to.





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Adding my first node to another one...

2013-08-01 Thread Jonathan Haddad

I recommend you do not add 1.2 nodes to a 1.1 cluster.   We tried this, and
ran into many issues.  Specifically, the data will not correctly stream
from the 1.1 nodes to the 1.2, and it will never bootstrap correctly.


On Thu, Aug 1, 2013 at 2:07 PM, Morgan Segalis msega...@gmail.com wrote:

 Hi Arthur,

 Thank you for your answer.
 I have read the section Adding Capacity to an Existing Cluster prior to
 posting my question.

 Actually I was thinking I would like Cassandra choose by itself the token.

 Since I want only some column family to be an ALL cluster, and other
 column family to be where they are, no matter balancing…

 I do not find anything on the configuration that I should make on the very
 first (and only node so far) to start the replication. (The configuration
 of my Node A is pretty basic, almost out of the box, I might changed the
 name)
 How to make this node know that it will be a Seed.

 My current Node A is using Cassandra 1.1.0

 Is it compatible if I install a new node with Cassandra 1.2.8 ? or should
 I fetch 1.1.0 for Node B ?

 Thank you.

 Morgan.


 Le 1 août 2013 à 20:32, Arthur Zubarev arthur.zuba...@aol.com a écrit
 :

  Hi Morgan,
 
  The scaling out depends on several factors. The most intricate is
 perhaps calculating the tokens.
 
  Also the Cassandra version is important.
 
  At this point in time I suggest you read section Adding Capacity to an
 Existing Cluster at
 http://www.datastax.com/docs/1.0/operations/cluster_management
  and come back here with questions and more details.
 
  Regards,
 
  Arthur
 
  -Original Message- From: Morgan Segalis
  Sent: Thursday, August 01, 2013 11:24 AM
  To: user@cassandra.apache.org
  Subject: Adding my first node to another one...
 
  Hi everyone,
 
  I'm trying to wrap my head around Cassandra great ability to expand…
 
  I have set up my first Cassandra node a while ago… it was working great,
 and data wasn't so important back then.
  Since I had a great experience with Cassandra I decided to migrate step
 by step my MySQL data to Cassandra.
 
  Now data start to be important, so I would like to create another node,
 and add it.
  Since I had some issue with my DataCenter, I wanted to have a copy (of
 sensible data only) on another DataCenter.
 
  Quite frankly I'm still a newbie on Cassandra and need your guys help.
 
  First things first…
  Already up and Running Cassandra (Called A):
  - Do I need to change anything to the cassandra.yaml to make sure that
 another node can connect ? if yes, should I restart the node (because I
 would have to warn users about downtime) ?
  - Since this node should be a seed, the seed list is already set to
 localhost, is that good enough ?
 
  The new node I want to add (Called B):
  - I know that before starting this node, I should modify the seed list
 in cassandra.yaml… Is that the only thing I need to do ?
 
  It is my first time doing this, so please be gentle ;-)
 
  Thank you all,
 
  Morgan.




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: CQL and undefined columns

2013-08-05 Thread Jonathan Haddad

The CQL docs recommend not using it - I didn't just make that up.  :)
 COMPACT STORAGE imposes the limit that you can't add columns to your
tables.  For those of us that are heavy CQL users, this limitation is a
total deal breaker.


On Mon, Aug 5, 2013 at 10:27 AM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 31, 2013 at 3:10 PM, Jonathan Haddad j...@jonhaddad.comwrote:

 It's advised you do not use compact storage, as it's primarily for
 backwards compatibility.


 Many Apache Cassandra experts do not advise against using COMPACT STORAGE.
 [1] Use CQL3 non-COMPACT STORAGE if you want to, but there are also valid
 reasons to not use it. Asserting that there is some good reason you should
 not use COMPACT STORAGE (other than range ghosts?) seems inaccurate. :)

 =Rob
 [1]
 http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/legacy_tables




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: CQL and undefined columns

2013-08-05 Thread Jonathan Haddad

If you expected your CQL3 query to work, then I think you've missed the
point of CQL completely.  For many of us, adding in a query layer which
gives us predictable column names, but continues to allow us to utilize
wide rows on disk is a huge benefit.  Why would I want to reinvent a system
for structured data when the DB can handle it for me?  I get a bunch of
stuff for free with CQL, which decreases my development time, which is the
resource that I happen to be the most bottlenecked on.

Feel free to continue to use thrift's wide row structure, with ad hoc
columns.  No one is stopping you.



On Mon, Aug 5, 2013 at 1:36 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 COMPACT STORAGE imposes the limit that you can't add columns to your
 tables.

 Is absolutely false. If anything CQL is imposing the limits!

 Simple to prove. Try something like this:

 create table abc (x int);
 insert into abc (y) values (5);

 and watch CQL reject the insert saying something to the effect of 'y?
 whats that? Did you mean CQL2 OR 1.5?, or hamburgers'

 Then go to the Cassandra cli and do this:
 create column family abd;
 set ['abd']['y']= '5';
 set ['abd']['z']='4';

 AND IT WORKS!

 I noticed the nomenclature starting to spring up around the term legacy
 tables and docs based around can't do with them. Frankly it makes me
 nuts because...

 This little known web company named google produced a white paper about
 what a ColumnFamily data model could do
 http://en.wikipedia.org/wiki/BigTable . Cassandra was build on the
 BigTable/ColumnFamily data model. There was also this big movement called
 NoSQL, where people wanted to break free of query languages and rigid
 schema's







 On Mon, Aug 5, 2013 at 1:56 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 The CQL docs recommend not using it - I didn't just make that up.  :)
  COMPACT STORAGE imposes the limit that you can't add columns to your
 tables.  For those of us that are heavy CQL users, this limitation is a
 total deal breaker.


 On Mon, Aug 5, 2013 at 10:27 AM, Robert Coli rc...@eventbrite.comwrote:

 On Wed, Jul 31, 2013 at 3:10 PM, Jonathan Haddad j...@jonhaddad.comwrote:

 It's advised you do not use compact storage, as it's primarily for
 backwards compatibility.


 Many Apache Cassandra experts do not advise against using COMPACT
 STORAGE. [1] Use CQL3 non-COMPACT STORAGE if you want to, but there are
 also valid reasons to not use it. Asserting that there is some good reason
 you should not use COMPACT STORAGE (other than range ghosts?) seems
 inaccurate. :)

 =Rob
 [1]
 http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/legacy_tables




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: CQL and undefined columns

2013-08-05 Thread Jonathan Haddad

CQL maps a series of logical rows into a single physical row by transposing
multiple rows based on partition and clustering keys into slices of a row.
 The point is to add a loose schema on top of a wide row which allows you
to stop reimplementing common patterns.

Yes, you can go in and mess with your tables via the cassandra-cli, but
that's not exactly proving me wrong.  You've simply removed the
constraints of CQL and wrote data to the table at a lower level that didn't
deal with schema enforcement.




On Mon, Aug 5, 2013 at 2:37 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Feel free to continue to use thrift's wide row structure, with ad hoc
 columns. No one is stopping you.

 Thanks. I was not trying to stop you from doing it your way either.

 You said this:

 COMPACT STORAGE imposes the limit that you can't add columns to your
 tables.

 I was demonstrating you are incorrect.

 I then went on to point out that Cassandra is a ColumnFamily data store
 which was designed around big table. You could always add column
 dynamically because schema-less is one of the key components of a
 ColumnFamily datastore.

 I know which CQL document you are loosely referencing that implies you can
 not add columns to compact storage. If that were true Cassandra would have
 never been a ColumnFamily data store.

 I have found several documents which are championing CQL and its
 constructs, which suggest that some thing can not be done with compact
 storage. In reality those are short comings of the CQL language. I say this
 because the language can not easily accommodate the original schema system.

 Many applications that are already written and performing well do NOT fit
 well into the CQL model of non compact storage (which does not have a name
 by the way probably because the opposite of compact is sparse and how would
 SPARSE STORAGE sound?). Implying all the original stuff is legacy and
 you should probably avoid it is wrong.

 In many cases compact storage it is the best way to store things, because
 it is the smallest.




 On Mon, Aug 5, 2013 at 4:57 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 If you expected your CQL3 query to work, then I think you've missed the
 point of CQL completely.  For many of us, adding in a query layer which
 gives us predictable column names, but continues to allow us to utilize
 wide rows on disk is a huge benefit.  Why would I want to reinvent a system
 for structured data when the DB can handle it for me?  I get a bunch of
 stuff for free with CQL, which decreases my development time, which is the
 resource that I happen to be the most bottlenecked on.

 Feel free to continue to use thrift's wide row structure, with ad hoc
 columns.  No one is stopping you.



 On Mon, Aug 5, 2013 at 1:36 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 COMPACT STORAGE imposes the limit that you can't add columns to your
 tables.

 Is absolutely false. If anything CQL is imposing the limits!

 Simple to prove. Try something like this:

 create table abc (x int);
 insert into abc (y) values (5);

 and watch CQL reject the insert saying something to the effect of 'y?
 whats that? Did you mean CQL2 OR 1.5?, or hamburgers'

 Then go to the Cassandra cli and do this:
 create column family abd;
 set ['abd']['y']= '5';
 set ['abd']['z']='4';

 AND IT WORKS!

 I noticed the nomenclature starting to spring up around the term legacy
 tables and docs based around can't do with them. Frankly it makes me
 nuts because...

 This little known web company named google produced a white paper about
 what a ColumnFamily data model could do
 http://en.wikipedia.org/wiki/BigTable . Cassandra was build on the
 BigTable/ColumnFamily data model. There was also this big movement called
 NoSQL, where people wanted to break free of query languages and rigid
 schema's







 On Mon, Aug 5, 2013 at 1:56 PM, Jonathan Haddad j...@jonhaddad.comwrote:

 The CQL docs recommend not using it - I didn't just make that up.  :)
  COMPACT STORAGE imposes the limit that you can't add columns to your
 tables.  For those of us that are heavy CQL users, this limitation is a
 total deal breaker.


 On Mon, Aug 5, 2013 at 10:27 AM, Robert Coli rc...@eventbrite.comwrote:

 On Wed, Jul 31, 2013 at 3:10 PM, Jonathan Haddad 
 j...@jonhaddad.comwrote:

 It's advised you do not use compact storage, as it's primarily for
 backwards compatibility.


 Many Apache Cassandra experts do not advise against using COMPACT
 STORAGE. [1] Use CQL3 non-COMPACT STORAGE if you want to, but there are
 also valid reasons to not use it. Asserting that there is some good reason
 you should not use COMPACT STORAGE (other than range ghosts?) seems
 inaccurate. :)

 =Rob
 [1]
 http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/legacy_tables




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com

Re: Issue with CQLsh

2013-08-25 Thread Jonathan Haddad

My understanding is that if you want to use CQL, you should create your
tables via CQL.  Mixing thrift calls w/ CQL seems like it's just asking for
problems like this.


On Sun, Aug 25, 2013 at 6:53 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 cassandra 1.2.4


 On Mon, Aug 26, 2013 at 2:51 AM, Nate McCall n...@thelastpickle.comwrote:

 What version of cassandra are you using?


 On Sun, Aug 25, 2013 at 8:34 AM, Vivek Mishra mishra.v...@gmail.comwrote:

 Hi,
 I have created a column family using Cassandra-cli as:

 create column family default;

 and then inserted some record as:

 set default[1]['type']='bytes';

 Then i tried to alter table it via cqlsh as:

 alter table default alter key type text;  // it works

 alter table default alter column1 type text; // it goes for a toss

 surprisingly any command after that, simple hangs and i need to reset
 connection.


 Any suggestions?









-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Cluster Management

2013-08-29 Thread Jonathan Haddad

An alternative to cssh is fabric.  It's very flexible in that you can
automate almost any repetitive task that you'd send to machines in a
cluster, and it's written in python, meaning if you're in AWS you can mix
it with boto to automate pretty much anything you want.


On Thu, Aug 29, 2013 at 4:25 PM, Anthony Grasso anthony.gra...@gmail.comwrote:

 Hi Particia,

 Thank you for the feedback. It has been helpful.


 On Tue, Aug 27, 2013 at 12:02 AM, Patricia Gorla gorla.patri...@gmail.com
  wrote:

 Anthony,

 We use a number of tools to manage our Cassandra cluster.

 * Datastax OpsCenter [0] for at a glance information, and trending
 statistics. You can also run operations through here, though I prefer
 to use nodetool for any mutative operation.
 * nodetool for ad hoc status checks, and day-to-day node management.
 * puppet for setup and initialization

  For example, if I want to make some changes to the configuration file
 that resides on each node, is there a tool that will propagate the change
 to each node?

 For this, we use puppet to manage any changes to the configurations
 (which are stored in git). We initially had Cassandra auto-restart
 when the configuration changed, but you might not want the node to
 automatically join a cluster, so we turned this off.


 Puppet was the first thing that came to mind for us as well. In addition,
 we had the same thought about auto-restarting nodes when the configuration
 is changed. If a configuration on all the nodes is changed, we would want
 to restart one node at a time and wait for it to rejoin before restarting
 the next one. I am assuming in a case like this, you then manually perform
 the restart operation for each node?



  Another example is if I want to have a rolling repair (nodetool repair
 -pr) and clean up running on my cluster, is there a tool that will help
 manage/configure that?

 Multiple commands to the cluster are sent via clusterssh [1] (cssh for
 OS X). I can easily choose which nodes to control, and run those in
 sync. For any rolling procedures, we send commands one at a time,
 though we've considered sending some of these tasks to cron.


 Thanks again for the tip! This is quite interesting; it may help to solve
 our immediate problem for now.

 Regards,
 Anthony



 Hope this helps.

 Cheers,
 Patricia


 [0] http://planetcassandra.org/Download/DataStaxCommunityEdition
 [1] http://sourceforge.net/projects/clusterssh/





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Low Row Cache Request

2013-08-31 Thread Jonathan Haddad

9/12 = .75


  


It's a rate, not a percentage.

On Sat, Aug 31, 2013 at 2:21 PM, Sávio Teles savio.te...@lupa.inf.ufg.br
wrote:

 I'm running one Cassandra node -version 1.2.6- and I *enabled* the *row
 cache* with *1GB*.
 But looking the Cassandra metrics on JConsole, *Row Cache Requests* are
 very *low* after a high number of queries (about 12 requests).
 RowCache metrics:
 *Capacity: 1GB*
 *Entries: 3
 *
 *HitRate: 0.75
 *
 *Hits: 9
 *
 *Requests: 12
 *
 *Size: 191630
 *
 Something wrong?
 -- 
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG

Re: Why don't you start off with a “single small” Cassandra server as you usually do it with MySQL?

2013-09-18 Thread Jonathan Haddad

For future references, a blog post on this topic.
http://rustyrazorblade.com/2013/09/cassandra-faq-can-i-start-with-a-single-node/

On Wed, Sep 18, 2013 at 6:38 AM, Michał Michalski mich...@opera.com wrote:

You might be interested in this:
http://mail-archives.apache.**org/mod_mbox/cassandra-user/**201308.mbox/%*
*3CCAEqOBHPAv25pcgjFwbkMD1RZxVr**iF94e6LpYbpJ3mU_bqN92iw@mail.**
gmail.com%3Ehttp://mail-archives.apache.org/mod_mbox/cassandra-user/201308.mbox/%3ccaeqobhpav25pcgjfwbkmd1rzxvrif94e6lpybpj3mu_bqn9...@mail.gmail.com%3E

W dniu 18.09.2013 15:34, Ertio Lew pisze:

For any website just starting out, the load initially is minimal grows
with a slow pace initially. People usually start with their MySQL based
sites with a single server(***that too a VPS not a dedicated server)
running as both app server as well as DB server usually get too far with
this setup only as they feel the need they separate the DB from the app
server giving it a separate VPS server. This is what a start up expects
the
things to be while planning about resources procurement.

But so far what I have seen, it's something very different with Cassandra.
People usually recommend starting out with atleast a 3 node cluster, (on
dedicated servers) with lots lots of RAM. 4GB or 8GB RAM is what they
suggest to start with. So is it that Cassandra requires more hardware
resources in comparison to MySQL, for a website to deliver similar
performance, serve similar load/ traffic same amount of data. I
understand about higher storage requirements of Cassandra due to
replication but what about other hardware resources ?

Can't we start off with Cassandra based apps just like MySQL. Starting
with
1 or 2 VPS adding more whenever there's a need ?

I don't want to compare apples with oranges. I just want to know how much
more dangerous situation I may be in when I start out with a single node
VPS based cassandra installation Vs a single node VPS based MySQL
installation. Difference between these two situations. Are cassandra
servers more prone to be unavailable than MySQL servers ? What is bad if I
put tomcat too along with Cassandra as people use LAMP stack on single
server.

*This question is also posted at StackOverflow
herehttp://stackoverflow.com/**questions/18462530/why-dont-**
you-start-off-with-a-single-**small-cassandra-server-as-you-**usuallyhttp://stackoverflow.com/questions/18462530/why-dont-you-start-off-with-a-single-small-cassandra-server-as-you-usually

has an open bounty worth +50 rep.*

--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Jonathan Haddad

We're currently using the cql package, which is really a wrapper around
thrift.

To your concern about deadlines, I'm not sure how writing raw CQL is going
to be any faster than using a mapper library for anything other than the
most trivial of project.



On Tue, Nov 26, 2013 at 11:09 AM, Kumar Ranjan winnerd...@gmail.com wrote:

 Jon - Thanks. As I understand, cqlengine is an object mapper and must be
 using for cql prepare statements. What are you wrapping it with, in
 alternative to python-driver?
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Tue, Nov 26, 2013 at 1:19 PM, Jonathan Haddad j...@jonhaddad.comwrote:

  So, for cqlengine (https://github.com/cqlengine/cqlengine), we're
 currently using the thrift api to execute CQL until the native driver is
 out of beta.  I'm a little biased in recommending it, since I'm one of the
 primary authors.  If you've got cqlengine specific questions, head to the
 mailing list: https://groups.google.com/forum/#!forum/cqlengine-users

 If you want to roll your own solution, it might make sense to take an
 approach like we did and throw a layer on top of thrift so you don't have
 to do a massive rewrite of your entire app once you want to go native.

 Jon


 On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan winnerd...@gmail.comwrote:

  I have worked with Pycassa before and wrote a wrapper to use batch
 mutation  connection pooling etc. But
 http://wiki.apache.org/cassandra/ClientOptions recommends now to use
 CQL 3 based api because Thrift based api (Pycassa) will be supported for
 backward compatibility only. Apache site recommends to use Python api
 written by DataStax which is still in Beta (As per their documentation).
 See warnings from their python-driver/README.rst file

 *Warning*

 This driver is currently under heavy development, so the API and layout
 of packages,modules, classes, and functions are subject to change. There
 may also be serious bugs, so usage in a production environment is *not* 
 recommended
 at this time.

 DataStax site http://www.datastax.com/download/clientdrivers recommends
 using DB-API 2.0 plus legacy api's. Is there more? Has any one compared
 between CQL 3 based apis? Which stands out on top? Answers based on facts
 will help the community so please refrain from opinions.

 Please help ??




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Jonathan Haddad

cqlengine supports batch queries, see the docs here:
http://cqlengine.readthedocs.org/en/latest/topics/queryset.html#batch-queries


On Tue, Nov 26, 2013 at 11:53 AM, Kumar Ranjan winnerd...@gmail.com wrote:

 Jon - Any comment on batching?
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Tue, Nov 26, 2013 at 2:52 PM, Laing, Michael michael.la...@nytimes.com
  wrote:

 That's not a problem we have faced yet.


 On Tue, Nov 26, 2013 at 2:46 PM, Kumar Ranjan winnerd...@gmail.comwrote:

 How do you insert huge amount of data?
  —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


  On Tue, Nov 26, 2013 at 2:31 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 I think thread pooling is always in operation - and we haven't seen any
 problems in that regard going to the 6 local nodes each client connects to.
 We haven't tried batching yet.


 On Tue, Nov 26, 2013 at 2:05 PM, Kumar Ranjan winnerd...@gmail.comwrote:

 Michael - thanks. Have you tried batching and thread pooling in
 python-driver? For now, i would avoid object mapper cqlengine, just 
 because
 of my deadlines.
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 We use the python-driver and have contributed some to its development.

 I have been careful to not push too fast on features until we need
 them. For example, we have just started using prepared statements - 
 working
 well BTW.

 Next we will employ futures and start to exploit the async nature of
 new interface to C*.

 We are very familiar with libev in both C and python, and are happy
 to dig into the code to add features and fix bugs as needed, so the 
 rewards
 of bypassing the old and focusing on the new seem worth the risks to us.

 ml


 On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad 
 j...@jonhaddad.comwrote:

  So, for cqlengine (https://github.com/cqlengine/cqlengine), we're
 currently using the thrift api to execute CQL until the native driver is
 out of beta.  I'm a little biased in recommending it, since I'm one of 
 the
 primary authors.  If you've got cqlengine specific questions, head to 
 the
 mailing list:
 https://groups.google.com/forum/#!forum/cqlengine-users

 If you want to roll your own solution, it might make sense to take
 an approach like we did and throw a layer on top of thrift so you don't
 have to do a massive rewrite of your entire app once you want to go 
 native.

 Jon


 On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan 
 winnerd...@gmail.comwrote:

  I have worked with Pycassa before and wrote a wrapper to use
 batch mutation  connection pooling etc. But
 http://wiki.apache.org/cassandra/ClientOptions recommends now to
 use CQL 3 based api because Thrift based api (Pycassa) will be 
 supported
 for backward compatibility only. Apache site recommends to use Python 
 api
 written by DataStax which is still in Beta (As per their 
 documentation).
 See warnings from their python-driver/README.rst file

 *Warning*

 This driver is currently under heavy development, so the API and
 layout of packages,modules, classes, and functions are subject to 
 change.
 There may also be serious bugs, so usage in a production environment is
 *not* recommended at this time.

 DataStax site http://www.datastax.com/download/clientdrivers recommends
 using DB-API 2.0 plus legacy api's. Is there more? Has any one compared
 between CQL 3 based apis? Which stands out on top? Answers based on 
 facts
 will help the community so please refrain from opinions.

 Please help ??




  --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade










-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Jonathan Haddad

No, 2.7 only.


On Tue, Nov 26, 2013 at 3:04 PM, Kumar Ranjan winnerd...@gmail.com wrote:

 Hi Jonathan - Does cqlengine have support for python 2.6 ?


 On Tue, Nov 26, 2013 at 4:17 PM, Jonathan Haddad j...@jonhaddad.comwrote:

 cqlengine supports batch queries, see the docs here:
 http://cqlengine.readthedocs.org/en/latest/topics/queryset.html#batch-queries


 On Tue, Nov 26, 2013 at 11:53 AM, Kumar Ranjan winnerd...@gmail.comwrote:

 Jon - Any comment on batching?
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Tue, Nov 26, 2013 at 2:52 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 That's not a problem we have faced yet.


 On Tue, Nov 26, 2013 at 2:46 PM, Kumar Ranjan winnerd...@gmail.comwrote:

 How do you insert huge amount of data?
  —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


  On Tue, Nov 26, 2013 at 2:31 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 I think thread pooling is always in operation - and we haven't seen
 any problems in that regard going to the 6 local nodes each client 
 connects
 to. We haven't tried batching yet.


 On Tue, Nov 26, 2013 at 2:05 PM, Kumar Ranjan 
 winnerd...@gmail.comwrote:

 Michael - thanks. Have you tried batching and thread pooling in
 python-driver? For now, i would avoid object mapper cqlengine, just 
 because
 of my deadlines.
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 We use the python-driver and have contributed some to its
 development.

 I have been careful to not push too fast on features until we need
 them. For example, we have just started using prepared statements - 
 working
 well BTW.

 Next we will employ futures and start to exploit the async nature
 of new interface to C*.

 We are very familiar with libev in both C and python, and are happy
 to dig into the code to add features and fix bugs as needed, so the 
 rewards
 of bypassing the old and focusing on the new seem worth the risks to 
 us.

 ml


 On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad j...@jonhaddad.com
  wrote:

  So, for cqlengine (https://github.com/cqlengine/cqlengine),
 we're currently using the thrift api to execute CQL until the native 
 driver
 is out of beta.  I'm a little biased in recommending it, since I'm 
 one of
 the primary authors.  If you've got cqlengine specific questions, 
 head to
 the mailing list:
 https://groups.google.com/forum/#!forum/cqlengine-users

 If you want to roll your own solution, it might make sense to take
 an approach like we did and throw a layer on top of thrift so you 
 don't
 have to do a massive rewrite of your entire app once you want to go 
 native.

 Jon


 On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan 
 winnerd...@gmail.com wrote:

  I have worked with Pycassa before and wrote a wrapper to use
 batch mutation  connection pooling etc. But
 http://wiki.apache.org/cassandra/ClientOptions recommends now to
 use CQL 3 based api because Thrift based api (Pycassa) will be 
 supported
 for backward compatibility only. Apache site recommends to use 
 Python api
 written by DataStax which is still in Beta (As per their 
 documentation).
 See warnings from their python-driver/README.rst file

 *Warning*

 This driver is currently under heavy development, so the API and
 layout of packages,modules, classes, and functions are subject to 
 change.
 There may also be serious bugs, so usage in a production environment 
 is
 *not* recommended at this time.

 DataStax site http://www.datastax.com/download/clientdrivers 
 recommends
 using DB-API 2.0 plus legacy api's. Is there more? Has any one 
 compared
 between CQL 3 based apis? Which stands out on top? Answers based on 
 facts
 will help the community so please refrain from opinions.

 Please help ??




  --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade










 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: cassandra performance problems

2013-12-05 Thread Jonathan Haddad

Do you mean high CPU usage or high load avg?  (20 indicates load avg to
me).  High load avg means the CPU is waiting on something.

Check iostat -dmx 1 100 to check your disk stats, you'll see the columns
that indicate mb/s read  write as well as % utilization.

Once you understand the bottleneck we can start to narrow down the cause.


On Thu, Dec 5, 2013 at 4:33 AM, Alexander Shutyaev shuty...@gmail.comwrote:

 Hi all,

 We have a 3 node cluster setup, single keyspace, about 500 tables. The
 hardware is 2 cores + 16 GB RAM (Cassandra chose to have 4GB). Cassandra
 version is 2.0.3. Our replication factor is 3, read/write consistency is
 QUORUM. We've plugged it into our production environment as a cache in
 front of postgres. Everything worked fine, we even stressed it by
 explicitly propagating about 30G (10G/node) data from postgres to cassandra.

 Then the problems came. Our nodes began showing high cpu usage (around
 20). The funny thing is that they were actually doing it one after another
 and there was always only node with high cpu usage. Using OpsCenter we saw
 that when the CPU was beginning to go high the node in question was
 performing compaction. But even after the compaction was performed the cpu
 remained still high, and in some cases didn't go down for hours. Our jmx
 monitoring showed that it was presumably in constant garbage collection.
 During that time cluster read latency goes from 2ms to 200ms

 What can be the reason? Can it be high number of tables? Do we need to
 adjust some settings for this setup? Is it ok to have so many tables?
 Theoretically we can stuck them all in 3-4 tables.

 Thanks in advance,
 Alexander




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

new project - Under Siege

2013-12-05 Thread Jonathan Haddad

I've recently pushed up a new project to github, which we've named Under
Siege.  It's a java agent for reporting Cassandra metrics to statsd.  We've
in the process of deploying it to our production clusters.  Tested against
Cassandra 1.2.11.  The metrics library seems to change on every release of
C*, so I'm not sure what'll happen if you deploy against a different
version.  Might need to mvn package against the same version of metrics.

https://github.com/StartTheShift/UnderSiege

I'm not much of a Java programmer so there's probably about a hundred
things I could have done better.  Pull requests welcome.
-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: cassandra backup

2013-12-06 Thread Jonathan Haddad

I believe SSTables are written to a temporary file then moved. If I
remember correctly, tools like tablesnap listen for the inotify event
IN_MOVED_TO. This should handle the try to back up sstable while in
mid-write issue.

On Fri, Dec 6, 2013 at 5:39 AM, Michael Theroux mthero...@yahoo.com wrote:

Hi Marcelo,

Cassandra provides and eventually consistent model for backups. You can
do staggered backups of data, with the idea that if you restore a node, and
then do a repair, your data will be once again consistent. Cassandra will
not automatically copy the data to other nodes (other than via hinted
handoff). You should manually run repair after restoring a node.

You should take snapshots when doing a backup, as it keeps the data you
are backing up relevant to a single point in time, otherwise compaction
could add/delete files one you mid-backup, or worse, I imagine attempt to
access a SSTable mid-write. Snapshots work by using links, and don't take
additional storage to perform. In our process we create the snapshot,
perform the backup, and then clear the snapshot.

One thing to keep in mind in your S3 cost analysis is that, even though
storage is cheap, reads/writes to S3 are not (especially writes). If you
are using LeveledCompaction, or otherwise have a ton of SSTables, some
people have encountered increased costs moving the data to S3.

Ourselves, we maintain backup EBS volumes that we regularly snaphot/rsync
data too. Thus far this has worked very well for us.

-Mike

On Friday, December 6, 2013 8:14 AM, Marcelo Elias Del Valle
marc...@s1mbi0se.com.br wrote:
Hello everyone,

I am trying to create backups of my data on AWS. My goal is to store
the backups on S3 or glacier, as it's cheap to store this kind of data. So,
if I have a cluster with N nodes, I would like to copy data from all N
nodes to S3 and be able to restore later. I know Priam does that (we were
using it), but I am using the latest cassandra version and we plan to use
DSE some time, I am not sure Priam fits this case.
I took a look at the docs:
http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_backup_takes_snapshot_t.html

And I am trying to understand if it's really needed to take a snapshot
to create my backup. Suppose I do a flush and copy the sstables from each
node, 1 by one, to s3. Not all at the same time, but one by one.
When I try to restore my backup, data from node 1 will be older than
data from node 2. Will this cause problems? AFAIK, if I am using a
replication factor of 2, for instance, and Cassandra sees data from node X
only, it will automatically copy it to other nodes, right? Is there any
chance of cassandra nodes become corrupt somehow if I do my backups this
way?

Best regards,
Marcelo Valle.

--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Cassandra ring not behaving like a ring

2014-01-16 Thread Jonathan Haddad

Please include the output of nodetool ring, otherwise no one can help you.


On Thu, Jan 16, 2014 at 12:45 PM, Narendra Sharma narendra.sha...@gmail.com
 wrote:

 Any pointers? I am planning to do rolling restart of the cluster nodes to
 see if it will help.
 On Jan 15, 2014 2:59 PM, Narendra Sharma narendra.sha...@gmail.com
 wrote:

 RF=3.
 On Jan 15, 2014 1:18 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 what is the RF? What does nodetool ring show?


 On Wed, Jan 15, 2014 at 1:03 PM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Sorry for the odd subject but something is wrong with our cassandra
 ring. We have a 9 node ring as below.

 N1 - UP/NORMAL
 N2 - UP/NORMAL
 N3 - UP/NORMAL
 N4 - UP/NORMAL
 N5 - UP/NORMAL
 N6 - UP/NORMAL
 N7 - UP/NORMAL
 N8 - UP/NORMAL
 N9 - UP/NORMAL

 Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS.

 I added a new node with token that is exactly in middle of N6 and N7.
 So the ring displayed as following
 N1 - UP/NORMAL
 N2 - UP/NORMAL
 N3 - UP/NORMAL
 N4 - UP/NORMAL
 N5 - UP/NORMAL
 N6 - UP/NORMAL
 N6.5 - UP/JOINING
 N7 - UP/NORMAL
 N8 - UP/NORMAL
 N9 - UP/NORMAL


 I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it to
 steam from (worst case) N5, N6, N7, N8. What could potentially cause the
 node to get confused about the ring?

 --
 Narendra Sharma
 Software Engineer
 *http://www.aeris.com http://www.aeris.com*
 *http://narendrasharma.blogspot.com/
 http://narendrasharma.blogspot.com/*





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Cassandra ring not behaving like a ring

2014-01-16 Thread Jonathan Haddad

It depends on a lot of factors.  If you've got all your machines in a
single rack, probably not.  But if you want to spread your data across
multiple racks or availability zones in AWS, it makes a huge difference.


On Thu, Jan 16, 2014 at 2:05 PM, Yogi Nerella ynerella...@gmail.com wrote:

 Hi,

 I am new to Cassandra Environment, does the order of the ring matter, as
 long as the member joins the group?

 Yogi


 On Thu, Jan 16, 2014 at 12:49 PM, Jonathan Haddad j...@jonhaddad.comwrote:

 Please include the output of nodetool ring, otherwise no one can help
 you.


 On Thu, Jan 16, 2014 at 12:45 PM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Any pointers? I am planning to do rolling restart of the cluster nodes
 to see if it will help.
 On Jan 15, 2014 2:59 PM, Narendra Sharma narendra.sha...@gmail.com
 wrote:

 RF=3.
 On Jan 15, 2014 1:18 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 what is the RF? What does nodetool ring show?


 On Wed, Jan 15, 2014 at 1:03 PM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Sorry for the odd subject but something is wrong with our cassandra
 ring. We have a 9 node ring as below.

 N1 - UP/NORMAL
 N2 - UP/NORMAL
 N3 - UP/NORMAL
 N4 - UP/NORMAL
 N5 - UP/NORMAL
 N6 - UP/NORMAL
 N7 - UP/NORMAL
 N8 - UP/NORMAL
 N9 - UP/NORMAL

 Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS.

 I added a new node with token that is exactly in middle of N6 and N7.
 So the ring displayed as following
 N1 - UP/NORMAL
 N2 - UP/NORMAL
 N3 - UP/NORMAL
 N4 - UP/NORMAL
 N5 - UP/NORMAL
 N6 - UP/NORMAL
 N6.5 - UP/JOINING
 N7 - UP/NORMAL
 N8 - UP/NORMAL
 N9 - UP/NORMAL


 I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it
 to steam from (worst case) N5, N6, N7, N8. What could potentially cause 
 the
 node to get confused about the ring?

 --
 Narendra Sharma
 Software Engineer
 *http://www.aeris.com http://www.aeris.com*
 *http://narendrasharma.blogspot.com/
 http://narendrasharma.blogspot.com/*





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Recommended OS

2014-02-12 Thread Jonathan Haddad

I just would advise against it because it's going to be difficult to narrow
down what's causing problems.  For instance, if you have Node A which is
performing GC, it will affect query times on Node B which is trying to
satisfy a quorum read.  Node B might actually have very low load, and it
will be difficult to understand why it's queries are responding slowly.

Meanwhile, Node A, during the GC pause, will have no disk activity, and
most of the CPUs will not be fully utilized.

I'm not saying it's impossible to do this, but I will say you better have a
really great understanding of every single OS in your cluster.  It's
generally hard to find people who are experts in Linux, Windows, and BeOS.

Of course, if you want to ride that train, you'd probably have a great blog
post.  My guess is it'll end in our recommendation is 'don't do this'


On Wed, Feb 12, 2014 at 2:36 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Feb 12, 2014 at 1:25 PM, Ben Bromhead b...@instaclustr.com wrote:

 If you are super keen on running on something different from linux in
 production (after all the warnings), run most of your cluster on linux,
 then run a single node or a separate DC with SmartOS, Solaris, BeOS, OS/2,
 Minix, Windows 3.1 or whatever it is that you choose and let us know how it
 all goes!


 My understanding is that running a mixed OS cluster is not officially
 supported. I could be wrong, but don't think I am. :)

 =Rob





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

abusing cassandra's multi DC abilities

2014-02-21 Thread Jonathan Haddad

Upfront TLDR: We want to do stuff (reindex documents, bust cache) when
changed data from DC1 shows up in DC2.

Full Story:
We're planning on adding data centers throughout the US.  Our platform is
used for business communications.  Each DC currently utilizes elastic
search and redis.  A message can be sent from one user to another, and the
intent is that it would be seen in near-real-time.  This means that 2
people may be using different data centers, and the messages need to
propagate from one to the other.

On the plus side, we know we get this with Cassandra (fist pump) but the
other pieces, not so much.  Even if they did work, there's all sorts of
race conditions that could pop up from having different pieces of our
architecture communicating over different channels.  From this, we've
arrived at the idea that since Cassandra is the authoritative data source,
we might be able to trigger events in DC2 based on activity coming through
either the commit log or some other means.  One idea was to use a CF with a
low gc time as a means of transporting messages between DCs, and watching
the commit logs for deletes to that CF in order to know when we need to do
things like reindex a document (or a new document), bust cache, etc.
 Facebook did something similar with their modifications to MySQL to
include cache keys in the replication log.

Assuming this is sane, I'd want to avoid having the same event register on
3 servers, thus registering 3 items in the queue when only one should be
there.  So, for any piece of data replicated from the other DC, I'd need a
way to determine if it was supposed to actually trigger the event or not.
 (Maybe it looks at the token and determines if the current server falls in
the token range?)  Or is there a better way?

So, my questions to all ye Cassandra users:

1. Is this is even sane?
2. Is anyone doing it?


-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

abusing cassandra's multi DC abilities

2014-02-22 Thread Jonathan Haddad

Upfront TLDR: We want to do stuff (reindex documents, bust cache) when
changed data from DC1 shows up in DC2.

Full Story:
We're planning on adding data centers throughout the US.  Our platform is
used for business communications.  Each DC currently utilizes elastic
search and redis.  A message can be sent from one user to another, and the
intent is that it would be seen in near-real-time.  This means that 2
people may be using different data centers, and the messages need to
propagate from one to the other.

On the plus side, we know we get this with Cassandra (fist pump) but the
other pieces, not so much.  Even if they did work, there's all sorts of
race conditions that could pop up from having different pieces of our
architecture communicating over different channels.  From this, we've
arrived at the idea that since Cassandra is the authoritative data source,
we might be able to trigger events in DC2 based on activity coming through
either the commit log or some other means.  One idea was to use a CF with a
low gc time as a means of transporting messages between DCs, and watching
the commit logs for deletes to that CF in order to know when we need to do
things like reindex a document (or a new document), bust cache, etc.
 Facebook did something similar with their modifications to MySQL to
include cache keys in the replication log.

Assuming this is sane, I'd want to avoid having the same event register on
3 servers, thus registering 3 items in the queue when only one should be
there.  So, for any piece of data replicated from the other DC, I'd need a
way to determine if it was supposed to actually trigger the event or not.
 (Maybe it looks at the token and determines if the current server falls in
the token range?)  Or is there a better way?

So, my questions to all ye Cassandra users:

1. Is this is even sane?
2. Is anyone doing it?

-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: abusing cassandra's multi DC abilities

2014-02-24 Thread Jonathan Haddad

Thanks for the input Todd.  I've considered a few of the options you've
listed.  I've ruled out redis because it's not really built for multi DC.
 I've got nothing against XMPP, or SQS.  However, they introduce race
conditions as well as all sorts of edge cases (missed messages, for
instance).  Since Cassandra is the source of truth, why not piggyback a
useful message within the true source of data itself?




On Mon, Feb 24, 2014 at 8:49 PM, Todd Fast t...@digitalexistence.comwrote:

 Hi Jonathan--

 First, best wishes for success with your platform.

 Frankly, I think the architecture you described is only going to cause
 you major trouble. I'm left wondering why you don't either use something
 like XMPP (of which several implementations can handle this kind of
 federated scenario) or simply have internal (REST) APIs to send a message
 from the backend in one DC to the backend in another DC.

 There are a bunch of ways to approach this problem: You could also use
 Redis pubsub (though a bit brittle), SQS, or any number of other approaches
 that would be simpler and more robust than what you described. I'd urge you
 to really consider another approach.

 Best,
 Todd


 On Saturday, February 22, 2014, Jonathan Haddad j...@jonhaddad.com wrote:

 Upfront TLDR: We want to do stuff (reindex documents, bust cache) when
 changed data from DC1 shows up in DC2.

 Full Story:
 We're planning on adding data centers throughout the US.  Our platform is
 used for business communications.  Each DC currently utilizes elastic
 search and redis.  A message can be sent from one user to another, and the
 intent is that it would be seen in near-real-time.  This means that 2
 people may be using different data centers, and the messages need to
 propagate from one to the other.

 On the plus side, we know we get this with Cassandra (fist pump) but the
 other pieces, not so much.  Even if they did work, there's all sorts of
 race conditions that could pop up from having different pieces of our
 architecture communicating over different channels.  From this, we've
 arrived at the idea that since Cassandra is the authoritative data source,
 we might be able to trigger events in DC2 based on activity coming through
 either the commit log or some other means.  One idea was to use a CF with a
 low gc time as a means of transporting messages between DCs, and watching
 the commit logs for deletes to that CF in order to know when we need to do
 things like reindex a document (or a new document), bust cache, etc.
  Facebook did something similar with their modifications to MySQL to
 include cache keys in the replication log.

 Assuming this is sane, I'd want to avoid having the same event register
 on 3 servers, thus registering 3 items in the queue when only one should be
 there.  So, for any piece of data replicated from the other DC, I'd need a
 way to determine if it was supposed to actually trigger the event or not.
  (Maybe it looks at the token and determines if the current server falls in
 the token range?)  Or is there a better way?

 So, my questions to all ye Cassandra users:

 1. Is this is even sane?
 2. Is anyone doing it?

 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad

I have a nagging memory of reading about issues with virtualization and not
actually having durable versions of your data even after an fsync (within
the VM).  Googling around lead me to this post:
http://petercai.com/virtualization-is-bad-for-database-integrity/

It's possible you're hitting this issue, with with the virtualization
layer, or with EBS itself.  Just a shot in the dark though, other people
would likely know much more than I.



On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie ussray...@yahoo.com wrote:

 Robert,

 That is what I thought as well.  But apparently something is happening.
  The only way I can get away with doing this is adding a sleep 60 right
 after the nodetool snapshot is executed.  I can reproduce this 100% of the
 time by not issuing a sleep after nodetool snapshot.

 This is the error.

 ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
 (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
 org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.EOFException
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:108)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
 at
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
 at
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
 at
 org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
 at java.io.DataInputStream.readUTF(DataInputStream.java:589)
 at java.io.DataInputStream.readUTF(DataInputStream.java:564)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:83)
  ... 11 more


   On Friday, March 28, 2014 2:38 PM, Robert Coli rc...@eventbrite.com
 wrote:
  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie ussray...@yahoo.comwrote:

 Thank you for your quick response.

 Is there a way to tell when a snapshot is completely done?


 IIRC, the JMX call blocks until the snapshot completes. It should be done
 when nodetool returns.

 =Rob





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad

I will +1 the recommendation on using tablesnap over EBS.  S3 is at least
predictable.

Additionally, from a practical standpoint, you may want to back up your
sstables somewhere.  If you use S3, it's easy to pull just the new tables
out via aws-cli tools (s3 sync), to your remote, non-aws server, and not
incur the overhead of routinely backing up the entire dataset.  For a non
trivial database, this matters quite a bit.


On Fri, Mar 28, 2014 at 1:21 PM, Laing, Michael
michael.la...@nytimes.comwrote:

 As I tried to say, EBS snapshots require much care or you get corruption
 such as you have encountered.

 Does Cassandra quiesce the file system after a snapshot using fsfreeze or
 xfs_freeze? Somehow I doubt it...


 On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad j...@jonhaddad.comwrote:

 I have a nagging memory of reading about issues with virtualization and
 not actually having durable versions of your data even after an fsync
 (within the VM).  Googling around lead me to this post:
 http://petercai.com/virtualization-is-bad-for-database-integrity/

 It's possible you're hitting this issue, with with the virtualization
 layer, or with EBS itself.  Just a shot in the dark though, other people
 would likely know much more than I.



 On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie ussray...@yahoo.comwrote:

 Robert,

 That is what I thought as well.  But apparently something is happening.
  The only way I can get away with doing this is adding a sleep 60 right
 after the nodetool snapshot is executed.  I can reproduce this 100% of the
 time by not issuing a sleep after nodetool snapshot.

 This is the error.

 ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
 (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
 org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.EOFException
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:108)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
  at
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
 at
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
  at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
 at
 org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
 at java.io.DataInputStream.readUTF(DataInputStream.java:589)
 at java.io.DataInputStream.readUTF(DataInputStream.java:564)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:83)
  ... 11 more


   On Friday, March 28, 2014 2:38 PM, Robert Coli rc...@eventbrite.com
 wrote:
  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie ussray...@yahoo.comwrote:

 Thank you for your quick response.

 Is there a way to tell when a snapshot is completely done?


 IIRC, the JMX call blocks until the snapshot completes. It should be
 done when nodetool returns.

 =Rob





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad

Another thing to keep in mind is that if you are hitting the issue I
described, waiting 60 seconds will not absolutely solve your problem, it
will only make it less likely to occur. If a memtable has been partially
flushed at the 60 second mark you will end up with the same corrupt sstable.

On Fri, Mar 28, 2014 at 1:32 PM, Laing, Michael
michael.la...@nytimes.comwrote:

+1 for tablesnap

On Fri, Mar 28, 2014 at 4:28 PM, Jonathan Haddad j...@jonhaddad.comwrote:

I will +1 the recommendation on using tablesnap over EBS. S3 is at least
predictable.

Additionally, from a practical standpoint, you may want to back up your
sstables somewhere. If you use S3, it's easy to pull just the new tables
out via aws-cli tools (s3 sync), to your remote, non-aws server, and not
incur the overhead of routinely backing up the entire dataset. For a non
trivial database, this matters quite a bit.

On Fri, Mar 28, 2014 at 1:21 PM, Laing, Michael
michael.la...@nytimes.com wrote:

As I tried to say, EBS snapshots require much care or you get corruption
such as you have encountered.

Does Cassandra quiesce the file system after a snapshot using fsfreeze
or xfs_freeze? Somehow I doubt it...

On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad j...@jonhaddad.comwrote:

I have a nagging memory of reading about issues with virtualization and
not actually having durable versions of your data even after an fsync
(within the VM). Googling around lead me to this post:
http://petercai.com/virtualization-is-bad-for-database-integrity/

It's possible you're hitting this issue, with with the virtualization
layer, or with EBS itself. Just a shot in the dark though, other people
would likely know much more than I.

On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie ussray...@yahoo.comwrote:

Robert,

That is what I thought as well. But apparently something is
happening. The only way I can get away with doing this is adding a sleep
60 right after the nodetool snapshot is executed. I can reproduce this
100% of the time by not issuing a sleep after nodetool snapshot.

This is the error.

ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290
CassandraDaemon.java (line 191) Exception in thread
Thread[SSTableBatchOpen:1,5,main]
org.apache.cassandra.io.sstable.CorruptSSTableException:
java.io.EOFException
at
org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:108)
at
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
at
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
at
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
at
org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.EOFException
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
at java.io.DataInputStream.readUTF(DataInputStream.java:589)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at
org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:83)
... 11 more

On Friday, March 28, 2014 2:38 PM, Robert Coli rc...@eventbrite.com
wrote:
On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie ussray...@yahoo.comwrote:

Thank you for your quick response.

Is there a way to tell when a snapshot is completely done?

IIRC, the JMX call blocks until the snapshot completes. It should be
done when nodetool returns.

=Rob

--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Tune cache MB settings per table.

2014-06-01 Thread Jonathan Haddad

I think of all the areas you could spend your time, this will have the
least returns.  The OS will keep the most frequently used data in memory.
 There's no reason to require cassandra to do it.

If you're curious as to what's been loaded into ram, try Al Tobey's pcstat
utility.  https://github.com/tobert/pcstat


On Sun, Jun 1, 2014 at 4:30 PM, Colin colpcl...@gmail.com wrote:

 Have you been unable to achieve your SLA's using Cassandra out of the box
 so far?

 Based upon my experience, trying to tune Cassandra before the app is done
 and without simulating real world load patterns, you might actually be
 doing yourself a disservice.

 --
 Colin
 320-221-9531


 On Jun 1, 2014, at 6:08 PM, Kevin Burton bur...@spinn3r.com wrote:

 Not in our experience… We've been using fadvise don't need to purge pages
 that aren't necessary any longer.

 Of course YMMV based on your usage.  I tend to like to control everything
 explicitly instead of having magic.

 That's worked out very well for us in the past so it would be nice to
 still have this on cassandra.


 On Sun, Jun 1, 2014 at 12:53 PM, Colin co...@clark.ws wrote:

 The OS should handle this really well as long as your on v3 linux
 kernel

 --
 *Colin Clark*
 +1-320-221-9531


 On Jun 1, 2014, at 2:49 PM, Kevin Burton bur...@spinn3r.com wrote:

 It's possible to set caching to:

 all, keys_only, rows_only, or none

 .. for a given table.

 But we have one table which is MASSIVE and we only need the most recent
 4-8 hours in memory.

 Anything older than that can go to disk as the queries there are very
 rare.

 … but I don't think cassandra can do this (which is a shame).

 Another option is to partition our tables per hour… then tell the older
 tables to cache 'none'…

 I hate this option though.  A smarter mechanism would be to have a
 compaction strategy that created an SSTable for every hour and then had
 custom caching settings for that table.

 The additional upside for this is that TTLs would just drop the older
 data in the compactor..

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Customized Compaction Strategy: Dev Questions

2014-06-04 Thread Jonathan Haddad

I'd suggest creating 1 table per day, and dropping the tables you don't
need once you're done.


On Wed, Jun 4, 2014 at 10:44 AM, Redmumba redmu...@gmail.com wrote:

 Sorry, yes, that is what I was looking to do--i.e., create a
 TopologicalCompactionStrategy or similar.


 On Wed, Jun 4, 2014 at 10:40 AM, Russell Bradberry rbradbe...@gmail.com
 wrote:

 Maybe I’m misunderstanding something, but what makes you think that
 running a major compaction every day will cause they data from January 1st
 to exist in only one SSTable and not have data from other days in the
 SSTable as well? Are you talking about making a new compaction strategy
 that creates SSTables by day?



 On June 4, 2014 at 1:36:10 PM, Redmumba (redmu...@gmail.com) wrote:

  Let's say I run a major compaction every day, so that the oldest
 sstable contains only the data for January 1st.  Assuming all the nodes are
 in-sync and have had at least one repair run before the table is dropped
 (so that all information for that time period is the same), wouldn't it
 be safe to assume that the same data would be dropped on all nodes?  There
 might be a period when the compaction is running where different nodes
 might have an inconsistent view of just that days' data (in that some would
 have it and others would not), but the cluster would still function and
 become eventually consistent, correct?

 Also, if the entirety of the sstable is being dropped, wouldn't the
 tombstones be removed with it?  I wouldn't be concerned with individual
 rows and columns, and this is a write-only table, more or less--the only
 deletes that occur in the current system are to delete the old data.


 On Wed, Jun 4, 2014 at 10:24 AM, Russell Bradberry rbradbe...@gmail.com
 wrote:

  I’m not sure what you want to do is feasible.  At a high level I can
 see you running into issues with RF etc.  The SSTables node to node are not
 identical, so if you drop a full SSTable on one node there is no one
 corresponding SSTable on the adjacent nodes to drop.You would need to
 choose data to compact out, and ensure it is removed on all replicas as
 well.  But if your problem is that you’re low on disk space then you
 probably won’t be able to write out a new SSTable with the older
 information compacted out. Also, there is more to an SSTable than just
 data, the SSTable could have tombstones and other relics that haven’t been
 cleaned up from nodes coming or going.




 On June 4, 2014 at 1:10:58 PM, Redmumba (redmu...@gmail.com) wrote:

   Thanks, Russell--yes, a similar concept, just applied to sstables.
 I'm assuming this would require changes to both major compactions, and
 probably GC (to remove the old tables), but since I'm not super-familiar
 with the C* internals, I wanted to make sure it was feasible with the
 current toolset before I actually dived in and started tinkering.

 Andrew


 On Wed, Jun 4, 2014 at 10:04 AM, Russell Bradberry rbradbe...@gmail.com
  wrote:

  hmm, I see. So something similar to Capped Collections in MongoDB.



 On June 4, 2014 at 1:03:46 PM, Redmumba (redmu...@gmail.com) wrote:

   Not quite; if I'm at say 90% disk usage, I'd like to drop the oldest
 sstable rather than simply run out of space.

 The problem with using TTLs is that I have to try and guess how much
 data is being put in--since this is auditing data, the usage can vary
 wildly depending on time of year, verbosity of auditing, etc..  I'd like to
 maximize the disk space--not optimize the cleanup process.

 Andrew


 On Wed, Jun 4, 2014 at 9:47 AM, Russell Bradberry rbradbe...@gmail.com
  wrote:

  You mean this:

  https://issues.apache.org/jira/browse/CASSANDRA-5228

  ?



 On June 4, 2014 at 12:42:33 PM, Redmumba (redmu...@gmail.com) wrote:

   Good morning!

 I've asked (and seen other people ask) about the ability to drop old
 sstables, basically creating a FIFO-like clean-up process.  Since we're
 using Cassandra as an auditing system, this is particularly appealing to 
 us
 because it means we can maximize the amount of auditing data we can keep
 while still allowing Cassandra to clear old data automatically.

 My idea is this: perform compaction based on the range of dates
 available in the sstable (or just metadata about when it was created).  
 For
 example, a major compaction could create a combined sstable per day--so
 that, say, 60 days of data after a major compaction would contain 60
 sstables.

 My question then is, will this be possible by simply implementing a
 separate AbstractCompactionStrategy?  Does this sound feasilble at all?
 Based on the implementation of Size and Leveled strategies, it looks like 
 I
 would have the ability to control what and how things get compacted, but I
 wanted to verify before putting time into it.

 Thank you so much for your time!

 Andrew








-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Bad Request: Type error: cannot assign result of function token (type bigint) to id (type int)

2014-06-05 Thread Jonathan Haddad

You should read through the token docs, it has examples and specifications:
http://cassandra.apache.org/doc/cql3/CQL.html#tokenFun


On Thu, Jun 5, 2014 at 10:22 PM, Kevin Burton bur...@spinn3r.com wrote:

 I'm building a new schema which I need to read externally by paging
 through the result set.

 My understanding from reading the documentation , and this list, is that I
 can do that but I need to use the token() function.

 Only it doesn't work.

 Here's a reduction:


 create table test_paging (
 id int,
 primary key(id)
 );

 insert into test_paging (id) values (1);
 insert into test_paging (id) values (2);
 insert into test_paging (id) values (3);
 insert into test_paging (id) values (4);
 insert into test_paging (id) values (5);

 select * from test_paging where id  token(0);

 … but it gives me:

 Bad Request: Type error: cannot assign result of function token (type
 bigint) to id (type int)

 …

 What's that about?  I can't find any documentation for this and there
 aren't any concise examples.


 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Bad Request: Type error: cannot assign result of function token (type bigint) to id (type int)

2014-06-05 Thread Jonathan Haddad

Sorry, the datastax docs are actually a bit better:
http://www.datastax.com/documentation/cql/3.0/cql/cql_using/paging_c.html

Jon


On Thu, Jun 5, 2014 at 10:46 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 You should read through the token docs, it has examples and
 specifications: http://cassandra.apache.org/doc/cql3/CQL.html#tokenFun


 On Thu, Jun 5, 2014 at 10:22 PM, Kevin Burton bur...@spinn3r.com wrote:

 I'm building a new schema which I need to read externally by paging
 through the result set.

 My understanding from reading the documentation , and this list, is that
 I can do that but I need to use the token() function.

 Only it doesn't work.

 Here's a reduction:


 create table test_paging (
 id int,
 primary key(id)
 );

 insert into test_paging (id) values (1);
 insert into test_paging (id) values (2);
 insert into test_paging (id) values (3);
 insert into test_paging (id) values (4);
 insert into test_paging (id) values (5);

 select * from test_paging where id  token(0);

 … but it gives me:

 Bad Request: Type error: cannot assign result of function token (type
 bigint) to id (type int)

 …

 What's that about?  I can't find any documentation for this and there
 aren't any concise examples.


 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: VPC AWS

2014-06-06 Thread Jonathan Haddad

This may not help you with the migration, but it may with maintenance 
management.  I just put up a blog post on managing VPC security groups with
a tool I open sourced at my previous company.  If you're going to have
different VPCs (staging / prod), it might help with managing security
groups.

http://rustyrazorblade.com/2014/06/an-introduction-to-roadhouse/

Semi shameless plug... but relevant.


On Thu, Jun 5, 2014 at 12:01 PM, Aiman Parvaiz ai...@shift.com wrote:

 Cool, thanks again for this.


 On Thu, Jun 5, 2014 at 11:51 AM, Michael Theroux mthero...@yahoo.com
 wrote:

 You can have a ring spread across EC2 and the public subnet of a VPC.
  That is how we did our migration.  In our case, we simply replaced the
 existing EC2 node with a new instance in the public VPC, restored from a
 backup taken right before the switch.

 -Mike

   --
  *From:* Aiman Parvaiz ai...@shift.com
 *To:* Michael Theroux mthero...@yahoo.com
 *Cc:* user@cassandra.apache.org user@cassandra.apache.org
 *Sent:* Thursday, June 5, 2014 2:39 PM
 *Subject:* Re: VPC AWS

 Thanks for this info Michael. As far as restoring node in public VPC is
 concerned I was thinking ( and I might be wrong here) if we can have a ring
 spread across EC2 and public subnet of a VPC, this way I can simply
 decommission nodes in Ec2 as I gradually introduce new nodes in public
 subnet of VPC and I will end up with a ring in public subnet and then
 migrate them from public to private in a similar way may be.

 If anyone has any experience/ suggestions with this please share, would
 really appreciate it.

 Aiman


 On Thu, Jun 5, 2014 at 10:37 AM, Michael Theroux mthero...@yahoo.com
 wrote:

 The implementation of moving from EC2 to a VPC was a bit of a juggling
 act.  Our motivation was two fold:

 1) We were running out of static IP addresses, and it was becoming
 increasingly difficult in EC2 to design around limiting the number of
 static IP addresses to the number of public IP addresses EC2 allowed
 2) VPC affords us an additional level of security that was desirable.

  However, we needed to consider the following limitations:

  1) By default, you have a limited number of available public IPs for
 both EC2 and VPC.
 2) AWS security groups need to be configured to allow traffic for
 Cassandra to/from instances in EC2 and the VPC.

  You are correct at the high level that the migration goes from
 EC2-Public VPC (VPC with an Internet Gateway)-Private VPC (VPC with a
 NAT).  The first phase was moving instances to the public VPC, setting
 broadcast and seeds to the public IPs we had available.  Basically:

 1) Take down a node, taking a snapshot for a backup
 2) Restore the node on the public VPC, assigning it to the correct
 security group, manually setting the seeds to other available nodes
 3) Verify the cluster can communicate
 4) Repeat

 Realize the NAT instance on the private subnet will also require a public
 IP.  What got really interesting is that near the end of the process we
 ran out of available IPs, requiring us to switch the final node that was on
 EC2 directly to the private VPC (and taking down two nodes at once, which
 our setup allowed given we had 6 nodes with an RF of 3).

 What we did, and highly suggest for the switch, is to write down every
 step that has to happen on every node during the switch.  In our case, many
 of the moved nodes required slightly different configurations for items
 like the seeds.

 Its been a couple of years, so my memory on this maybe a little fuzzy :)

 -Mike

   --
  *From:* Aiman Parvaiz ai...@shift.com
 *To:* user@cassandra.apache.org; Michael Theroux mthero...@yahoo.com
 *Sent:* Thursday, June 5, 2014 12:55 PM
 *Subject:* Re: VPC AWS

 Michael,
 Thanks for the response, I am about to head in to something very similar
 if not exactly same. I envision things happening on the same lines as you
 mentioned.
 I would be grateful if you could please throw some more light on how you
 went about switching cassandra nodes from public subnet to private with out
 any downtime.
 I have not started on this project yet, still in my research phase. I
 plan to have a ec2+public VPC cluster and then decomission ec2 nodes to
 have everything in public subnet, next would be to move it to private
 subnet.

 Thanks


 On Thu, Jun 5, 2014 at 8:14 AM, Michael Theroux mthero...@yahoo.com
 wrote:

 We personally use the EC2Snitch, however, we don't have the multi-region
 requirements you do,

 -Mike

   --
  *From:* Alain RODRIGUEZ arodr...@gmail.com
 *To:* user@cassandra.apache.org
 *Sent:* Thursday, June 5, 2014 9:14 AM
 *Subject:* Re: VPC AWS

 I think you can define VPC subnet to be public (to have public + private
 IPs) or private only.

 Any insight regarding snitches ? What snitch do you guys use ?


 2014-06-05 15:06 GMT+02:00 William Oberman ober...@civicscience.com:

 I don't think traffic will flow between classic ec2 and vpc

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad

Your other option is to fire off async queries.  It's pretty
straightforward w/ the java or python drivers.

On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle
marc...@s1mbi0se.com.br wrote:
 I was taking a look at Cassandra anti-patterns list:

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html

 Among then is

 SELECT ... IN or index lookups¶

 SELECT ... IN and index lookups (formerly secondary indexes) should be
 avoided except for specific scenarios. See When not to use IN in SELECT and
 When not to use an index in Indexing in

 CQL for Cassandra 2.0

 And Looking at the SELECT doc, I saw:

 When not to use IN¶

 The recommendations about when not to use an index apply to using IN in the
 WHERE clause. Under most conditions, using IN in the WHERE clause is not
 recommended. Using IN can degrade performance because usually many nodes
 must be queried. For example, in a single, local data center cluster having
 30 nodes, a replication factor of 3, and a consistency level of
 LOCAL_QUORUM, a single key query goes out to two nodes, but if the query
 uses the IN condition, the number of nodes being queried are most likely
 even higher, up to 20 nodes depending on where the keys fall in the token
 range.

 In my system, I have a column family called entity_lookup:

 CREATE KEYSPACE IF NOT EXISTS Identification1
   WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
   'DC1' : 3 };
 USE Identification1;

 CREATE TABLE IF NOT EXISTS entity_lookup (
   name varchar,
   value varchar,
   entity_id uuid,
   PRIMARY KEY ((name, value), entity_id));

 And I use the following select to query it:

 SELECT entity_id FROM entity_lookup WHERE name=%s and value in(%s)

 Is this an anti-pattern?

 If not using SELECT IN, which other way would you recomend for lookups like
 that? I have several values I would like to search in cassandra and they
 might not be in the same particion, as above.

 Is Cassandra the wrong tool for lookups like that?

 Best regards,
 Marcelo Valle.














-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad

If you use async and your driver is token aware, it will go to the
proper node, rather than requiring the coordinator to do so.

Realistically you're going to have a connection open to every server
anyways.  It's the difference between you querying for the data
directly and using a coordinator as a proxy.  It's faster to just ask
the node with the data.

On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle
marc...@s1mbi0se.com.br wrote:
 But using async queries wouldn't be even worse than using SELECT IN?
 The justification in the docs is I could query many nodes, but I would still
 do it.

 Today, I use both async queries AND SELECT IN:

 SELECT_ENTITY_LOOKUP = SELECT entity_id FROM  + ENTITY_LOOKUP +  WHERE
 name=%s and value in(%s)

 for name, values in identifiers.items():
query = self.SELECT_ENTITY_LOOKUP % ('%s', ','.join(['%s']*len(values)))
args = [name] + values
query_msg = query % tuple(args)
futures.append((query_msg, self.session.execute_async(query, args)))

 for query_msg, future in futures:
try:
   rows = future.result(timeout=10)
   for row in rows:
 entity_ids.add(row.entity_id)
except:
   logging.error(Query '%s' returned ERROR  % (query_msg))
   raise

 Using async just with select = would mean instead of 1 async query (example:
 in (0, 1, 2)), I would do several, one for each value of values array
 above.
 In my head, this would mean more connections to Cassandra and the same
 amount of work, right? What would be the advantage?

 []s




 2014-06-19 22:01 GMT-03:00 Jonathan Haddad j...@jonhaddad.com:

 Your other option is to fire off async queries.  It's pretty
 straightforward w/ the java or python drivers.

 On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle
 marc...@s1mbi0se.com.br wrote:
  I was taking a look at Cassandra anti-patterns list:
 
 
  http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html
 
  Among then is
 
  SELECT ... IN or index lookups¶
 
  SELECT ... IN and index lookups (formerly secondary indexes) should be
  avoided except for specific scenarios. See When not to use IN in SELECT
  and
  When not to use an index in Indexing in
 
  CQL for Cassandra 2.0
 
  And Looking at the SELECT doc, I saw:
 
  When not to use IN¶
 
  The recommendations about when not to use an index apply to using IN in
  the
  WHERE clause. Under most conditions, using IN in the WHERE clause is not
  recommended. Using IN can degrade performance because usually many nodes
  must be queried. For example, in a single, local data center cluster
  having
  30 nodes, a replication factor of 3, and a consistency level of
  LOCAL_QUORUM, a single key query goes out to two nodes, but if the query
  uses the IN condition, the number of nodes being queried are most likely
  even higher, up to 20 nodes depending on where the keys fall in the
  token
  range.
 
  In my system, I have a column family called entity_lookup:
 
  CREATE KEYSPACE IF NOT EXISTS Identification1
WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
'DC1' : 3 };
  USE Identification1;
 
  CREATE TABLE IF NOT EXISTS entity_lookup (
name varchar,
value varchar,
entity_id uuid,
PRIMARY KEY ((name, value), entity_id));
 
  And I use the following select to query it:
 
  SELECT entity_id FROM entity_lookup WHERE name=%s and value in(%s)
 
  Is this an anti-pattern?
 
  If not using SELECT IN, which other way would you recomend for lookups
  like
  that? I have several values I would like to search in cassandra and they
  might not be in the same particion, as above.
 
  Is Cassandra the wrong tool for lookups like that?
 
  Best regards,
  Marcelo Valle.
 
 
 
 
 
 
 
 
 
 
 



 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad

The only case in which it might be better to use an IN clause is if
the entire query can be satisfied from that machine.  Otherwise, go
async.

The native driver reuses connections and intelligently manages the
pool for you.  It can also multiplex queries over a single connection.

I am assuming you're using one of the datastax drivers for CQL, btw.

Jon

On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle
marc...@s1mbi0se.com.br wrote:
 This is interesting, I didn't know that!
 It might make sense then to use select = + async + token aware, I will try
 to change my code.

 But would it be a recomended solution for these cases? Any other options?

 I still would if this is the right use case for Cassandra, to look for
 random keys in a huge cluster. After all, the amount of connections to
 Cassandra will still be huge, right... Wouldn't it be a problem?
 Or when you use async the driver reuses the connection?

 []s


 2014-06-19 22:16 GMT-03:00 Jonathan Haddad j...@jonhaddad.com:

 If you use async and your driver is token aware, it will go to the
 proper node, rather than requiring the coordinator to do so.

 Realistically you're going to have a connection open to every server
 anyways.  It's the difference between you querying for the data
 directly and using a coordinator as a proxy.  It's faster to just ask
 the node with the data.

 On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle
 marc...@s1mbi0se.com.br wrote:
  But using async queries wouldn't be even worse than using SELECT IN?
  The justification in the docs is I could query many nodes, but I would
  still
  do it.
 
  Today, I use both async queries AND SELECT IN:
 
  SELECT_ENTITY_LOOKUP = SELECT entity_id FROM  + ENTITY_LOOKUP + 
  WHERE
  name=%s and value in(%s)
 
  for name, values in identifiers.items():
 query = self.SELECT_ENTITY_LOOKUP % ('%s',
  ','.join(['%s']*len(values)))
 args = [name] + values
 query_msg = query % tuple(args)
 futures.append((query_msg, self.session.execute_async(query, args)))
 
  for query_msg, future in futures:
 try:
rows = future.result(timeout=10)
for row in rows:
  entity_ids.add(row.entity_id)
 except:
logging.error(Query '%s' returned ERROR  % (query_msg))
raise
 
  Using async just with select = would mean instead of 1 async query
  (example:
  in (0, 1, 2)), I would do several, one for each value of values array
  above.
  In my head, this would mean more connections to Cassandra and the same
  amount of work, right? What would be the advantage?
 
  []s
 
 
 
 
  2014-06-19 22:01 GMT-03:00 Jonathan Haddad j...@jonhaddad.com:
 
  Your other option is to fire off async queries.  It's pretty
  straightforward w/ the java or python drivers.
 
  On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle
  marc...@s1mbi0se.com.br wrote:
   I was taking a look at Cassandra anti-patterns list:
  
  
  
   http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html
  
   Among then is
  
   SELECT ... IN or index lookups¶
  
   SELECT ... IN and index lookups (formerly secondary indexes) should
   be
   avoided except for specific scenarios. See When not to use IN in
   SELECT
   and
   When not to use an index in Indexing in
  
   CQL for Cassandra 2.0
  
   And Looking at the SELECT doc, I saw:
  
   When not to use IN¶
  
   The recommendations about when not to use an index apply to using IN
   in
   the
   WHERE clause. Under most conditions, using IN in the WHERE clause is
   not
   recommended. Using IN can degrade performance because usually many
   nodes
   must be queried. For example, in a single, local data center cluster
   having
   30 nodes, a replication factor of 3, and a consistency level of
   LOCAL_QUORUM, a single key query goes out to two nodes, but if the
   query
   uses the IN condition, the number of nodes being queried are most
   likely
   even higher, up to 20 nodes depending on where the keys fall in the
   token
   range.
  
   In my system, I have a column family called entity_lookup:
  
   CREATE KEYSPACE IF NOT EXISTS Identification1
 WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
 'DC1' : 3 };
   USE Identification1;
  
   CREATE TABLE IF NOT EXISTS entity_lookup (
 name varchar,
 value varchar,
 entity_id uuid,
 PRIMARY KEY ((name, value), entity_id));
  
   And I use the following select to query it:
  
   SELECT entity_id FROM entity_lookup WHERE name=%s and value in(%s)
  
   Is this an anti-pattern?
  
   If not using SELECT IN, which other way would you recomend for
   lookups
   like
   that? I have several values I would like to search in cassandra and
   they
   might not be in the same particion, as above.
  
   Is Cassandra the wrong tool for lookups like that?
  
   Best regards,
   Marcelo Valle.
  
  
  
  
  
  
  
  
  
  
  
 
 
 
  --
  Jon Haddad
  http://www.rustyrazorblade.com
  skype

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jonathan Haddad

, RF-3 cluster in AWS.

 Also why do the work the coordinator will do for you: send all the
 queries, wait for everything to come back in whatever order, and sort 
 the
 result.

 I would rather keep my app code simple.

 But the real point is that you should benchmark in your own
 environment.

 ml


 On Fri, Jun 20, 2014 at 3:29 AM, Marcelo Elias Del Valle
 marc...@s1mbi0se.com.br wrote:

 Yes, I am using the CQL datastax drivers.
 It was a good advice, thanks a lot Janathan.
 []s


 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com:

 The only case in which it might be better to use an IN clause is
 if
 the entire query can be satisfied from that machine.  Otherwise,
 go
 async.

 The native driver reuses connections and intelligently manages the
 pool for you.  It can also multiplex queries over a single
 connection.

 I am assuming you're using one of the datastax drivers for CQL,
 btw.

 Jon

 On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle
 marc...@s1mbi0se.com.br wrote:
  This is interesting, I didn't know that!
  It might make sense then to use select = + async + token aware,
  I will try
  to change my code.
 
  But would it be a recomended solution for these cases? Any
  other options?
 
  I still would if this is the right use case for Cassandra, to
  look for
  random keys in a huge cluster. After all, the amount of
  connections to
  Cassandra will still be huge, right... Wouldn't it be a problem?
  Or when you use async the driver reuses the connection?
 
  []s
 
 
  2014-06-19 22:16 GMT-03:00 Jonathan Haddad j...@jonhaddad.com:
 
  If you use async and your driver is token aware, it will go to
  the
  proper node, rather than requiring the coordinator to do so.
 
  Realistically you're going to have a connection open to every
  server
  anyways.  It's the difference between you querying for the data
  directly and using a coordinator as a proxy.  It's faster to
  just ask
  the node with the data.
 
  On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle
  marc...@s1mbi0se.com.br wrote:
   But using async queries wouldn't be even worse than using
   SELECT IN?
   The justification in the docs is I could query many nodes,
   but I would
   still
   do it.
  
   Today, I use both async queries AND SELECT IN:
  
   SELECT_ENTITY_LOOKUP = SELECT entity_id FROM  +
   ENTITY_LOOKUP + 
   WHERE
   name=%s and value in(%s)
  
   for name, values in identifiers.items():
  query = self.SELECT_ENTITY_LOOKUP % ('%s',
   ','.join(['%s']*len(values)))
  args = [name] + values
  query_msg = query % tuple(args)
  futures.append((query_msg,
   self.session.execute_async(query, args)))
  
   for query_msg, future in futures:
  try:
 rows = future.result(timeout=10)
 for row in rows:
   entity_ids.add(row.entity_id)
  except:
 logging.error(Query '%s' returned ERROR  %
   (query_msg))
 raise
  
   Using async just with select = would mean instead of 1 async
   query
   (example:
   in (0, 1, 2)), I would do several, one for each value of
   values array
   above.
   In my head, this would mean more connections to Cassandra and
   the same
   amount of work, right? What would be the advantage?
  
   []s
  
  
  
  
   2014-06-19 22:01 GMT-03:00 Jonathan Haddad
   j...@jonhaddad.com:
  
   Your other option is to fire off async queries.  It's pretty
   straightforward w/ the java or python drivers.
  
   On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle
   marc...@s1mbi0se.com.br wrote:
I was taking a look at Cassandra anti-patterns list:
   
   
   
   
http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html
   
Among then is
   
SELECT ... IN or index lookups¶
   
SELECT ... IN and index lookups (formerly secondary
indexes) should
be
avoided except for specific scenarios. See When not to use
IN in
SELECT
and
When not to use an index in Indexing in
   
CQL for Cassandra 2.0
   
And Looking at the SELECT doc, I saw:
   
When not to use IN¶
   
The recommendations about when not to use an index apply
to using IN
in
the
WHERE clause. Under most conditions, using IN in the WHERE
clause is
not
recommended. Using IN can degrade performance because
usually many
nodes
must be queried. For example, in a single, local data
center cluster
having
30 nodes, a replication factor of 3, and a consistency
level of
LOCAL_QUORUM, a single key query goes out to two nodes,
but if the
query
uses the IN condition, the number of nodes being queried
are most
likely
even higher, up to 20 nodes depending on where the keys
fall in the
token
range.
   
In my system, I have a column family called
entity_lookup:
   
CREATE KEYSPACE IF NOT EXISTS Identification1
  WITH REPLICATION = { 'class

Re: Adding large text blob causes read timeout...

2014-06-24 Thread Jonathan Haddad

Can you do you query in the cli after setting tracing on?

On Mon, Jun 23, 2014 at 11:32 PM, DuyHai Doan doanduy...@gmail.com wrote:

Yes but adding the extra one ends up by * 1000. The limit in CQL3
specifies the number of logical rows, not the number of physical columns in
the storage engine
Le 24 juin 2014 08:30, Kevin Burton bur...@spinn3r.com a écrit :

oh.. the difference between the the ONE field and the remaining 29 is
massive.

It's like 200ms for just the 29 columns.. adding the extra one cause it
to timeout .. 5000ms...

On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan doanduy...@gmail.com
wrote:

Don't forget that when you do the Select with limit set to 1000,
Cassandra is actually fetching 1000 * 29 physical columns (29 fields per
logical row).

Adding one extra big html column may be too much and cause timeout. Try
to:

1. Select only the big html only
2. Or reduce the limit incrementally until no timeout
Le 24 juin 2014 06:22, Kevin Burton bur...@spinn3r.com a écrit :

I have a table with a schema mostly of small fields. About 30 of them.

The primary key is:

primary key( bucket, sequence )

… I have 100 buckets and the idea is that sequence is ever increasing.
This way I can read from bucket zero, and everything after sequence N and
get all the writes ordered by time.

I'm running

SELECT ... FROM content WHERE bucket=0 AND sequence0 ORDER BY sequence
ASC LIMIT 1000;

… using the have driver.

If I add ALL the fields, except one, so 29 fields, the query is fast.
Only 129ms….

However, if I add the 'html' field, which is snapshot of HTML obvious,
the query times out…

I'm going to add tracing and try to track it down further, but I
suspect I'm doing something stupid.

Is it going to burn me that the data is UTF8 encoded? I can't image
decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
something silly under the covers?

cqlsh doesn't time out … it actually works fine but it uses 100% CPU
while writing out the data so it's not a good comparison unfortunately

ception in thread main
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: ...:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
at
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
at
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
at
com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
at
com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
Caused by:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
at
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
at
com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations
are people.

--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Triggers and their use in data indexing

2014-07-03 Thread Jonathan Haddad

Triggers only execute on the local coordinator.  I would also not
recommend using them.

On Thu, Jul 3, 2014 at 9:58 AM, Robert Coli rc...@eventbrite.com wrote:
 On Thu, Jul 3, 2014 at 4:41 AM, Bèrto ëd Sèra berto.d.s...@gmail.com
 wrote:

 Now the question: is there any way to use triggers so that they will
 locally index data from remote DCs when it comes in?


 As I understand it, you probably should not use triggers in production in
 their current form.

 =Rob




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Triggers and their use in data indexing

2014-07-03 Thread Jonathan Haddad

This is one of the trickier areas of doing multi dc.  The current
recommendation is to use a separate message queue.

If you'd like to see remote triggers, you could fire a JIRA.  Get back
to the list w/ the ticket #, I'm sure there are others who have
similar needs.

On Thu, Jul 3, 2014 at 10:04 AM, Jonathan Haddad j...@jonhaddad.com wrote:
 Triggers only execute on the local coordinator.  I would also not
 recommend using them.

 On Thu, Jul 3, 2014 at 9:58 AM, Robert Coli rc...@eventbrite.com wrote:
 On Thu, Jul 3, 2014 at 4:41 AM, Bèrto ëd Sèra berto.d.s...@gmail.com
 wrote:

 Now the question: is there any way to use triggers so that they will
 locally index data from remote DCs when it comes in?


 As I understand it, you probably should not use triggers in production in
 their current form.

 =Rob




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade



-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Write Inconsistency to update a row

2014-07-03 Thread Jonathan Haddad

Did you make sure all the nodes are on the same time?  If they're not,
you'll get some weird results.

On Thu, Jul 3, 2014 at 10:30 AM, Sávio S. Teles de Oliveira
savio.te...@cuia.com.br wrote:
 Are you sure all the nodes are working at that time?


 Yes. They are working.

 I would suggest increasing the replication factor (for example 3) and use
 CL=ALL or QUORUM to find out what is going wrong.


 I did! I still have the same problem.



 2014-07-03 13:40 GMT-03:00 Panagiotis Garefalakis panga...@gmail.com:

 This seems like a hinted handoff issue but since you use CL = ONE it
 should happen.
 Are you sure all the nodes are working at that time? You could use
 nodetool status to check that.
 I would suggest increasing the replication factor (for example 3) and use
 CL=ALL or QUORUM to find out what is going wrong.

 Regards,
 Panagiotis


 On Thu, Jul 3, 2014 at 5:11 PM, Sávio S. Teles de Oliveira
 savio.te...@cuia.com.br wrote:

 replication_factor=1
 CL=ONE

 Does the data show up eventually?

 Yes.

 Can be the clocks?


 2014-07-03 10:47 GMT-03:00 graham sanderson gra...@vast.com:

 What is your keyspace replication_factor?

 What consistency level are you reading/writing with?

 Does the data show up eventually?

 I’m assuming you don’t have any errors (timeouts etc) on the write site


 On Jul 3, 2014, at 7:55 AM, Sávio S. Teles de Oliveira
 savio.te...@cuia.com.br wrote:

 I have two Cassandra 2.0.5 servers running with some datas inserted,
 where each row have one empty column. When the client send a lot of update
 commands to fill this column in each row, some lines update their content,
 but some lines remain with the empty column.

 Using one server, this never happens!

 Any suggestions?

 Tks.
 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 CUIA Internet Brasil





 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 CUIA Internet Brasil





 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 CUIA Internet Brasil



-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Write Inconsistency to update a row

2014-07-03 Thread Jonathan Haddad

Make sure you've got ntpd running, otherwise this will be an ongoing nightmare.

On Thu, Jul 3, 2014 at 5:00 PM, Sávio S. Teles de Oliveira
savio.te...@cuia.com.br wrote:
 I have synchronized the clocks and works!


 2014-07-03 20:58 GMT-03:00 Sávio S. Teles de Oliveira
 savio.te...@cuia.com.br:

 Did you make sure all the nodes are on the same time?  If they're not,
 you'll get some weird results.


 They were not on the same time. I've synchronized the time and works!

 Tks


 2014-07-03 16:58 GMT-03:00 Jack Krupansky j...@basetechnology.com:

 You said that the updates do show up eventually – how long does it take?

 -- Jack Krupansky

 From: Sávio S. Teles de Oliveira
 Sent: Thursday, July 3, 2014 1:30 PM
 To: user@cassandra.apache.org
 Subject: Re: Write Inconsistency to update a row


 Are you sure all the nodes are working at that time?


 Yes. They are working.


 I would suggest increasing the replication factor (for example 3) and
 use CL=ALL or QUORUM to find out what is going wrong.


 I did! I still have the same problem.



 2014-07-03 13:40 GMT-03:00 Panagiotis Garefalakis panga...@gmail.com:

 This seems like a hinted handoff issue but since you use CL = ONE it
 should happen.
 Are you sure all the nodes are working at that time? You could use
 nodetool status to check that.
 I would suggest increasing the replication factor (for example 3) and
 use CL=ALL or QUORUM to find out what is going wrong.

 Regards,
 Panagiotis


 On Thu, Jul 3, 2014 at 5:11 PM, Sávio S. Teles de Oliveira
 savio.te...@cuia.com.br wrote:

 replication_factor=1
 CL=ONE


 Does the data show up eventually?

 Yes.

 Can be the clocks?


 2014-07-03 10:47 GMT-03:00 graham sanderson gra...@vast.com:

 What is your keyspace replication_factor?

 What consistency level are you reading/writing with?

 Does the data show up eventually?

 I’m assuming you don’t have any errors (timeouts etc) on the write
 site


 On Jul 3, 2014, at 7:55 AM, Sávio S. Teles de Oliveira
 savio.te...@cuia.com.br wrote:

 I have two Cassandra 2.0.5 servers running with some datas inserted,
 where each row have one empty column. When the client send a lot of 
 update
 commands to fill this column in each row, some lines update their 
 content,
 but some lines remain with the empty column.

 Using one server, this never happens!

 Any suggestions?

 Tks.
 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 CUIA Internet Brasil






 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 CUIA Internet Brasil






 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 CUIA Internet Brasil




 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 CUIA Internet Brasil




 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 CUIA Internet Brasil



-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Cassandra use cases/Strengths/Weakness

2014-07-08 Thread Jonathan Haddad

I've used various databases in production for over 10 years.  Each has
strengths and weaknesses.

I ran Cassandra for just shy of 2 years in production as part of both
development teams and operations, and I only hit 1 serious problem
that Rob mentioned.  Ideally C* would have guarded against it, but it
did not.  I did not have any downtime as a result, however.  For those
curious, I tried to add 1.2 nodes to a 1.1 cluster.  Aside from that,
I actually did find Cassandra simple to operate  manage.

I used Cassandra as more of a general purpose database.  I was willing
to give up some query flexibility in favor of high availability and
multi dc support.  There were times we needed to add more servers to
deal with additional load, it handled it perfectly.

For me it wasn't such a big problem, there's always optimizations that
need to be made no matter what DB you use.

Disclaimer: I now work for Datastax.


On Tue, Jul 8, 2014 at 5:51 PM, Robert Coli rc...@eventbrite.com wrote:
 On Fri, Jul 4, 2014 at 2:10 PM, DuyHai Doan doanduy...@gmail.com wrote:

  c. operational simplicity due to master-less architecture. This feature
 is, although quite transparent for developers, is a key selling point.
 Having suffered when installing manually a Hadoop cluster, I happen to love
 the deployment simplicity of C*, only one process per node, no moving parts.


 Asserting that Cassandra, as a fully functioning production system, is
 currently easier to operate than RDBMS is just false. It is still false even
 if we ignore the availability of experienced RDBMS operators and decades of
 RDBMS operational best practice.

 The quality of software engineering practice in RDBMS land also most
 assuredly results in a more easily operable system in many, many use cases.
 Yes, Cassandra is more tolerant to individual node failures. This turns out
 to not matter as much in terms of operability as non-operators appear to
 think it does. Very trivial operational activities (create a new
 columnfamily or replace a failed node) are subject to failure mode edge
 cases which often are not resolvable without brute force methods.

 I am unable to get my head around the oft-heard marketing assertion that a
 data-store in which such common activities are not bulletproof is capable of
 being than better to operate than the RDBMS status quo. The production
 operators I know also do not agree that Cassandra is simple to operate.

 All the above aside, I continue to maintain that Cassandra is the best at
 being the type of thing that it is. If you have a need to horizontally scale
 a use case that is well suited for its strength and poorly suited for RDBMS,
 you should use it. Far fewer people actually have this sort of case than
 think they do.

 =Rob



-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: horizontal query scaling issues follow on

2014-07-17 Thread Jonathan Haddad

The problem with starting without vnodes is moving to them is a bit
hairy.  In particular, nodetool shuffle has been reported to take an
extremely long time (days, weeks).  I would start with vnodes if you
have any intent on using them.

On Thu, Jul 17, 2014 at 6:03 PM, Robert Coli rc...@eventbrite.com wrote:
 On Thu, Jul 17, 2014 at 5:16 PM, Diane Griffith dfgriff...@gmail.com
 wrote:

 I did tests comparing 1, 2, 10, 20, 50, 100 clients spawned all querying.
 Performance on 2 nodes starts to degrade from 10 clients on.  I saw similar
 behavior on 4 nodes but haven't done the official runs on that yet.


 Ok, if you've multi-threaded your client, then you aren't starving for
 client thread paralellism, and that rules out another scalability
 bottleneck.

 As a brief aside, you only lose from vnodes until your cluster is larger
 than a certain sizes, and then only when adding or removing nodes from a
 cluster. Perhaps if you are ramping up and scientifically testing smaller
 cluster sizes, you should start at first with a token per range, ie
 pre-vnodes operation?

 I basically did the command and it was outputting 256 tokens on each node
 and comma separated.  So I tried taking that string and setting that as the
 value to initial_token but the node wouldn't start up.

 Not sure if I maybe had a carriage return in there and that was the
 problem.


 It should take a comma delimited list of tokens, did the failed node startup
 log any error?


 And if I do that do I need to do more than comment out num_tokens?


 No, though you probably should anyway in order to be unambiguous.

 =Rob




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: map reduce for Cassandra

2014-07-21 Thread Jonathan Haddad

Hey Marcelo,

You should check out spark.  It intelligently deals with a lot of the
issues you're mentioning.  Al Tobey did a walkthrough of how to set up
the OSS side of things here:
http://tobert.github.io/post/2014-07-15-installing-cassandra-spark-stack.html

It'll be less work than writing a M/R framework from scratch :)
Jon


On Mon, Jul 21, 2014 at 8:24 AM, Marcelo Elias Del Valle
marc...@s1mbi0se.com.br wrote:
 Hi,

 I have the need to executing a map/reduce job to identity data stored in
 Cassandra before indexing this data to Elastic Search.

 I have already used ColumnFamilyInputFormat (before start using CQL) to
 write hadoop jobs to do that, but I use to have a lot of troubles to perform
 tunning, as hadoop depends on how map tasks are split in order to
 successfull execute things in parallel, for IO/bound processes.

 First question is: Am I the only one having problems with that? Is anyone
 else using hadoop jobs that reads from Cassandra in production?

 Second question is about the alternatives. I saw new version spark will have
 Cassandra support, but using CqlPagingInputFormat, from hadoop. I tried to
 use HIVE with Cassandra community, but it seems it only works with Cassandra
 Enterprise and doesn't do more than FB presto (http://prestodb.io/), which
 we have been using reading from Cassandra and so far it has been great for
 SQL-like queries. For custom map reduce jobs, however, it is not enough.

 Does anyone know some other tool that performs MR on Cassandra? My
 impression is most tools were created to work on top of HDFS and reading
 from a nosql db is some kind of workaround.

 Third question is about how these tools work. Most of them writtes mapped
 data on a intermediate storage, then data is shuffled and sorted, then it is
 reduced. Even when using CqlPagingInputFormat, if you are using hadoop it
 will write files to HDFS after the mapping phase, shuffle and sort this
 data, and then reduce it.

 I wonder if a tool supporting Cassandra out of the box wouldn't be smarter.
 Is it faster to write all your data to a file and then sorting it, or batch
 inserting data and already indexing it, as it happens when you store data in
 a Cassandra CF? I didn't do the calculations to check the complexity of each
 one, what should consider no index in Cassandra would be really large, as
 the maximum index size will always depend on the maximum capacity of a
 single host, but my guess is that a map / reduce tool written specifically
 to Cassandra, from the beggining, could perform much better than a tool
 written to HDFS and adapted. I hear people saying Map/Reduce on
 Cassandra/HBase is usually 30% slower than M/R in HDFS. Does it really make
 sense? Should we expect a result like this?

 Final question: Do you think writting a new M/R tool like described would be
 reinventing the wheel? Or it makes sense?

 Thanks in advance. Any opinions about this subject will be very appreciated.

 Best regards,
 Marcelo Valle.



-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: map reduce for Cassandra

2014-07-21 Thread Jonathan Haddad

I haven't tried pyspark yet, but it's part of the distribution.  My
main language is Python too, so I intend on getting deep into it.

On Mon, Jul 21, 2014 at 9:38 AM, Marcelo Elias Del Valle
marc...@s1mbi0se.com.br wrote:
 Hi Jonathan,

 Do you know if this RDD can be used with Python? AFAIK, python + Cassandra
 will be supported just in the next version, but I would like to be wrong...

 Best regards,
 Marcelo Valle.



 2014-07-21 13:06 GMT-03:00 Jonathan Haddad j...@jonhaddad.com:

 Hey Marcelo,

 You should check out spark.  It intelligently deals with a lot of the
 issues you're mentioning.  Al Tobey did a walkthrough of how to set up
 the OSS side of things here:

 http://tobert.github.io/post/2014-07-15-installing-cassandra-spark-stack.html

 It'll be less work than writing a M/R framework from scratch :)
 Jon


 On Mon, Jul 21, 2014 at 8:24 AM, Marcelo Elias Del Valle
 marc...@s1mbi0se.com.br wrote:
  Hi,
 
  I have the need to executing a map/reduce job to identity data stored in
  Cassandra before indexing this data to Elastic Search.
 
  I have already used ColumnFamilyInputFormat (before start using CQL) to
  write hadoop jobs to do that, but I use to have a lot of troubles to
  perform
  tunning, as hadoop depends on how map tasks are split in order to
  successfull execute things in parallel, for IO/bound processes.
 
  First question is: Am I the only one having problems with that? Is
  anyone
  else using hadoop jobs that reads from Cassandra in production?
 
  Second question is about the alternatives. I saw new version spark will
  have
  Cassandra support, but using CqlPagingInputFormat, from hadoop. I tried
  to
  use HIVE with Cassandra community, but it seems it only works with
  Cassandra
  Enterprise and doesn't do more than FB presto (http://prestodb.io/),
  which
  we have been using reading from Cassandra and so far it has been great
  for
  SQL-like queries. For custom map reduce jobs, however, it is not enough.
 
  Does anyone know some other tool that performs MR on Cassandra? My
  impression is most tools were created to work on top of HDFS and reading
  from a nosql db is some kind of workaround.
 
  Third question is about how these tools work. Most of them writtes
  mapped
  data on a intermediate storage, then data is shuffled and sorted, then
  it is
  reduced. Even when using CqlPagingInputFormat, if you are using hadoop
  it
  will write files to HDFS after the mapping phase, shuffle and sort this
  data, and then reduce it.
 
  I wonder if a tool supporting Cassandra out of the box wouldn't be
  smarter.
  Is it faster to write all your data to a file and then sorting it, or
  batch
  inserting data and already indexing it, as it happens when you store
  data in
  a Cassandra CF? I didn't do the calculations to check the complexity of
  each
  one, what should consider no index in Cassandra would be really large,
  as
  the maximum index size will always depend on the maximum capacity of a
  single host, but my guess is that a map / reduce tool written
  specifically
  to Cassandra, from the beggining, could perform much better than a tool
  written to HDFS and adapted. I hear people saying Map/Reduce on
  Cassandra/HBase is usually 30% slower than M/R in HDFS. Does it really
  make
  sense? Should we expect a result like this?
 
  Final question: Do you think writting a new M/R tool like described
  would be
  reinventing the wheel? Or it makes sense?
 
  Thanks in advance. Any opinions about this subject will be very
  appreciated.
 
  Best regards,
  Marcelo Valle.



 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: cluster rebalancing…

2014-07-22 Thread Jonathan Haddad

You don't need to specify tokens. The new node gets them automatically. 

 On Jul 22, 2014, at 7:03 PM, Kevin Burton bur...@spinn3r.com wrote:
 
 So , shouldn't it be easy to rebalance a cluster?
 
 I'm not super excited to type out 200 commands to move around individual 
 tokens.
 
 I realize that this isn't a super easy solution, and that there are probably 
 2-3 different algorithms to pick here… but having this be the only option 
 doesn't seem scalable.
 
 -- 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile

Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jonathan Haddad

This is incorrect.  Network Topology w/ Vnodes will be fine, assuming
you've got RF= # of racks.  For each token, replicas are chosen based
on the strategy.  Essentially, you could have a wild imbalance in
token ownership, but it wouldn't matter because the replicas would be
distributed across the rest of the machines.

http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique
dominique.dev...@thalesgroup.com wrote:
 Hi,



 My understanding is that NetworkTopologyStrategy does NOT play well with
 vnodes, due to:

 · Vnode = tokens are (usually) randomly generated (AFAIK)

 · NetworkTopologyStrategy = required carefully choosen tokens for
 all nodes in order to not to get a VERY unbalanced ring like in
 https://issues.apache.org/jira/browse/CASSANDRA-3810



 When playing with vnodes, is the recommendation to define one rack for the
 entire cluster ?



 Thanks.



 Regards,

 Dominique







-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jonathan Haddad

* When I say wild imbalance, I do not mean all tokens on 1 node in the
cluster, I really should have said slightly imbalanced

On Tue, Aug 5, 2014 at 8:43 AM, Jonathan Haddad j...@jonhaddad.com wrote:
 This is incorrect.  Network Topology w/ Vnodes will be fine, assuming
 you've got RF= # of racks.  For each token, replicas are chosen based
 on the strategy.  Essentially, you could have a wild imbalance in
 token ownership, but it wouldn't matter because the replicas would be
 distributed across the rest of the machines.

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

 On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique
 dominique.dev...@thalesgroup.com wrote:
 Hi,



 My understanding is that NetworkTopologyStrategy does NOT play well with
 vnodes, due to:

 · Vnode = tokens are (usually) randomly generated (AFAIK)

 · NetworkTopologyStrategy = required carefully choosen tokens for
 all nodes in order to not to get a VERY unbalanced ring like in
 https://issues.apache.org/jira/browse/CASSANDRA-3810



 When playing with vnodes, is the recommendation to define one rack for the
 entire cluster ?



 Thanks.



 Regards,

 Dominique







 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade



-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jonathan Haddad

Yes, if you have only 1 machine in a rack then your cluster will be
imbalanced. You're going to be able to dream up all sorts of weird
failure cases when you choose a scenario like RF=2 totally
imbalanced network arch.

Vnodes attempt to solve the problem of imbalanced rings by choosing so
many tokens that it's improbable that the ring will be imbalanced.

On Tue, Aug 5, 2014 at 8:57 AM, DE VITO Dominique
dominique.dev...@thalesgroup.com wrote:
First, thanks for your answer.

This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've
got RF= # of racks.

IMHO, it's not a good enough condition.
Let's use an example with RF=2

N1/rack_1 N2/rack_1 N3/rack_1 N4/rack_2

Here, you have RF= # of racks
And due to NetworkTopologyStrategy, N4 will store *all* the cluster data,
leading to a completely imbalanced cluster.

IMHO, it happens when using nodes *or* vnodes.

As well-balanced clusters with NetworkTopologyStrategy rely on carefully
chosen token distribution/path along the ring *and* as tokens are
randomly-generated with vnodes, my guess is that with vnodes and
NetworkTopologyStrategy, it's better to define a single (logical) rack // due
to carefully chosen tokens vs randomly-generated token clash.

I don't see other options left.
Do you see other ones ?

Regards,
Dominique

-Message d'origine-
De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part
de Jonathan Haddad
Envoyé : mardi 5 août 2014 17:43
À : user@cassandra.apache.org
Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ?

This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've
got RF= # of racks. For each token, replicas are chosen based on the
strategy. Essentially, you could have a wild imbalance in token ownership,
but it wouldn't matter because the replicas would be distributed across the
rest of the machines.

http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique
dominique.dev...@thalesgroup.com wrote:
Hi,

My understanding is that NetworkTopologyStrategy does NOT play well
with vnodes, due to:

· Vnode = tokens are (usually) randomly generated (AFAIK)

· NetworkTopologyStrategy = required carefully choosen tokens for
all nodes in order to not to get a VERY unbalanced ring like in
https://issues.apache.org/jira/browse/CASSANDRA-3810

When playing with vnodes, is the recommendation to define one rack for
the entire cluster ?

Thanks.

Regards,

Dominique

--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: too many open files

2014-08-09 Thread Jonathan Haddad

It really doesn't need to be this complicated.  You only need 1
session per application.  It's thread safe and manages the connection
pool for you.

http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/Session.html



On Sat, Aug 9, 2014 at 1:29 PM, Kevin Burton bur...@spinn3r.com wrote:
 Another idea to detect this is when the number of open sessions exceeds the
 number of threads.

 On Aug 9, 2014 10:59 AM, Andrew redmu...@gmail.com wrote:

 I just had a generator that (in the incorrect way) had a cluster as a
 member variable, and would call .connect() repeatedly.  I _thought_,
 incorrectly, that the Session was thread unsafe, and so I should request a
 separate Session each time—obviously wrong in hind sight.

 There was no special logic; I had a restriction of about 128 connections
 per host, but the connections were in the 100s of thousands, like the OP
 mentioned.  Again, I’ll see about reproducing it on Monday, but just wanted
 the repro steps (overall) to live somewhere in case I can’t. :)

 Andrew

 On August 8, 2014 at 4:08:50 PM, Tyler Hobbs (ty...@datastax.com) wrote:


 On Fri, Aug 8, 2014 at 5:52 PM, Redmumba redmu...@gmail.com wrote:

 Just to chime in, I also ran into this issue when I was migrating to the
 Datastax client. Instead of reusing the session, I was opening a new session
 each time. For some reason, even though I was still closing the session on
 the client side, I was getting the same error.


 Which driver?  If you can still reproduce this, would you mind opening a
 ticket? (https://datastax-oss.atlassian.net/secure/BrowseProjects.jspa#all)


 --
 Tyler Hobbs
 DataStax



-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Table not being created but no error.

2014-08-13 Thread Jonathan Haddad

Can you provide the code that you use to create the table?  This feels like
code error rather than a database bug.


On Wed, Aug 13, 2014 at 1:26 PM, Kevin Burton bur...@spinn3r.com wrote:

 2.0.5… I'm upgrading to 2.0.9 now just to rule this out….

 I can give you the full CQL for the table, but I can't seem to reproduce
 it without my entire app being included.

 If I execute the CQL manually, it works… which is what makes this so weird.


 On Wed, Aug 13, 2014 at 1:11 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Can you just give the C* version and the complete DDL script to reproduce
 the issue ?


 On Wed, Aug 13, 2014 at 10:08 PM, Kevin Burton bur...@spinn3r.com
 wrote:

 I'm tracking down a weird bug and was wondering if you guys had any
 feedback.

 I'm trying to create ten tables programatically.. .

 The first one I create, for some reason, isn't created.

 The other 9 are created without a problem.

 Im doing this with the datastax driver's session.execute().

 No exceptions are thrown.

 I read the tables back out, and I have 9 of them, but not the first one.

 I can confirm that the table isn't there because I'm doing a

   select * from foo0 limit 1

 and it gives me an unconfigured column family exception.

 so it looks like cassandra is just silently not creating the table.

 This is just in my junit harness for now.  So it's one cassandra node so
 there shouldn't be an issue with schema disagreement.

 Kind of stumped here so any suggestion would help.

 --

  Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com





 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re:

2014-08-25 Thread Jonathan Haddad

It sounds like your clocks are out of sync.  Run ntpdate to fix your
clock  then make sure you're running ntpd on every machine.

On Mon, Aug 25, 2014 at 1:25 PM, Sávio S. Teles de Oliveira
savio.te...@cuia.com.br wrote:
 We're using cassandra 2.0.9 with datastax java cassandra driver 2.0.0 in a
 cluster of eight nodes.

 We're doing an insert and after a delete like:

 delete from column_family_name where id = value

 Immediatly select to check whether the DELETE was successful. Sometimes the
 value still there!!


 Any suggestions?

 --

 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 CUIA Internet Brasil



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re:

2014-08-25 Thread Jonathan Haddad

This is actually a more correct response than mine, I made a few
assumptions that may or may not be true.

On Mon, Aug 25, 2014 at 1:31 PM, Robert Coli rc...@eventbrite.com wrote:
 On Mon, Aug 25, 2014 at 1:25 PM, Sávio S. Teles de Oliveira
 savio.te...@cuia.com.br wrote:

 We're using cassandra 2.0.9 with datastax java cassandra driver 2.0.0 in a
 cluster of eight nodes.

 We're doing an insert and after a delete like:

 delete from column_family_name where id = value

 Immediatly select to check whether the DELETE was successful. Sometimes
 the value still there!!

 What are Replication Factor (RF) and Consistency Level (CL)?

 =Rob




-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Failed to enable shuffling error

2014-09-08 Thread Jonathan Haddad

I believe shuffle has been removed recently.  I do not recommend using
it for any reason.

If you really want to go vnodes, your only sane option is to add a new
DC that uses vnodes and switch to it.

The downside in the 2.0.x branch to using vnodes is that repairs take
N times as long, where N is the number of tokens you put on each node.
I can't think of any other reasons why you wouldn't want to use vnodes
(but this may be significant enough for you by itself)

2.1 should address the repair issue for most use cases.

Jon


On Mon, Sep 8, 2014 at 1:28 PM, Robert Coli rc...@eventbrite.com wrote:
 On Mon, Sep 8, 2014 at 1:21 PM, Tim Heckman t...@pagerduty.com wrote:

 We're still at the exploratory stage on systems that are not
 production-facing but contain production-like data. Based on our
 placement strategy we have some concerns that the new datacenter
 approach may be riskier or more difficult. We're just trying to gauge
 both paths and see what works best for us.


 Your case of RF=N is probably the best possible case for shuffle, but
 general statements about how much this code has been exercised remain. :)


 The cluster I'm testing this on is a 5 node cluster with a placement
 strategy such that all nodes contain 100% of the data. In practice we
 have six clusters of similar size that are used for different
 services. These different clusters may need additional capacity at
 different times, so it's hard to answer the maximum size question. For
 now let's just assume that the clusters may never see an 11th
 member... but no guarantees.


 With RF of 3, cluster sizes of under approximately 10 tend to net lose from
 vnodes. If these clusters are not very likely to ever have more than 10
 nodes, consider not using Vnodes.


 We're looking to use vnodes to help with easing the administrative
 work of scaling out the cluster. The improvements of streaming data
 during repairs amongst others.


 Most of these wins don't occur until you have a lot of nodes, but the fixed
 costs of having many ranges are paid all the time.


 For shuffle, it looks like it may be easier than adding a new
 datacenter and then have to adjust the schema for a new datacenter
 to come to life. And we weren't sure whether the same pitfalls of
 shuffle would effect us while having all data on all nodes.


 Let us know! Good luck!

 =Rob




-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Failed to enable shuffling error

2014-09-08 Thread Jonathan Haddad

Thrift is still present in the 2.0 branch as well as 2.1.  Where did
you see that it's deprecated?

Let me elaborate my earlier advice.  Shuffle was removed because it
doesn't work for anything beyond a trivial dataset.  It is definitely
more risky than adding a new vnode enabled DC, as it does not work
at all.

On Mon, Sep 8, 2014 at 2:01 PM, Tim Heckman t...@pagerduty.com wrote:
 On Mon, Sep 8, 2014 at 1:45 PM, Jonathan Haddad j...@jonhaddad.com wrote:
 I believe shuffle has been removed recently.  I do not recommend using
 it for any reason.

 We're still using the 1.2.x branch of Cassandra, and will be for some
 time due to the thrift deprecation. Has it only been removed from the
 2.x line?

 If you really want to go vnodes, your only sane option is to add a new
 DC that uses vnodes and switch to it.

 We use the NetworkTopologyStrategy across three geographically
 separated regions. Doing it this way feels a bit more risky based on
 our replication strategy. Also, I'm not sure where all we have our
 current datacenter names defined across our different internal
 repositories. So there could be quite a large number of changes going
 this route.

 The downside in the 2.0.x branch to using vnodes is that repairs take
 N times as long, where N is the number of tokens you put on each node.
 I can't think of any other reasons why you wouldn't want to use vnodes
 (but this may be significant enough for you by itself)

 2.1 should address the repair issue for most use cases.

 Jon

 Thank you for the notes on the behaviors in the 2.x branch. If we do
 move to the 2.x version that's something we'll be keeping in mind.

 Cheers!
 -Tim

 On Mon, Sep 8, 2014 at 1:28 PM, Robert Coli rc...@eventbrite.com wrote:
 On Mon, Sep 8, 2014 at 1:21 PM, Tim Heckman t...@pagerduty.com wrote:

 We're still at the exploratory stage on systems that are not
 production-facing but contain production-like data. Based on our
 placement strategy we have some concerns that the new datacenter
 approach may be riskier or more difficult. We're just trying to gauge
 both paths and see what works best for us.


 Your case of RF=N is probably the best possible case for shuffle, but
 general statements about how much this code has been exercised remain. :)


 The cluster I'm testing this on is a 5 node cluster with a placement
 strategy such that all nodes contain 100% of the data. In practice we
 have six clusters of similar size that are used for different
 services. These different clusters may need additional capacity at
 different times, so it's hard to answer the maximum size question. For
 now let's just assume that the clusters may never see an 11th
 member... but no guarantees.


 With RF of 3, cluster sizes of under approximately 10 tend to net lose from
 vnodes. If these clusters are not very likely to ever have more than 10
 nodes, consider not using Vnodes.


 We're looking to use vnodes to help with easing the administrative
 work of scaling out the cluster. The improvements of streaming data
 during repairs amongst others.


 Most of these wins don't occur until you have a lot of nodes, but the fixed
 costs of having many ranges are paid all the time.


 For shuffle, it looks like it may be easier than adding a new
 datacenter and then have to adjust the schema for a new datacenter
 to come to life. And we weren't sure whether the same pitfalls of
 shuffle would effect us while having all data on all nodes.


 Let us know! Good luck!

 =Rob




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: multi datacenter replication

2014-09-10 Thread Jonathan Haddad

Multi-dc is available in every version of Cassandra.

On Wed, Sep 10, 2014 at 9:21 AM, Oleg Ruchovets oruchov...@gmail.com wrote:
 Thank you very much for the links.
   Just to be sure: is this capability available for COMMUNITY ADDITION?

 Thanks
 Oleg.

 On Wed, Sep 10, 2014 at 11:49 PM, Alain RODRIGUEZ arodr...@gmail.com
 wrote:

 Hi Oleg,

 Yes Replication cross DC is something available for a long time already,
 so it is assumed to be stable.

 As discussed in this thread, Cassandra documentation is often outdated or
 inexistant, the alternative is datastax one.


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/initialize/initializeMultipleDS.html

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_decomission_dc_t.html

 Hope you'll find everything you need. If some info is missing, come back
 and ask.

 Alain

 2014-09-10 16:58 GMT+02:00 Oleg Ruchovets oruchov...@gmail.com:

 Hi All.
Is multi datacenter replication capability available in community
 addition?
 If yes can someone share the experience  how stable is it and where can I
 read the best practice of it?

 Thanks
 Oleg.






-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Concurrents deletes and updates

2014-09-17 Thread Jonathan Haddad

Make sure your clocks are synced.  If they aren't, the writetime that
determines the most recent value will be incorrect.

On Wed, Sep 17, 2014 at 11:58 AM, Robert Coli rc...@eventbrite.com wrote:
 On Wed, Sep 17, 2014 at 11:55 AM, Sávio S. Teles de Oliveira
 savio.te...@cuia.com.br wrote:

 I'm using the Cassandra 2.0.9 with JAVA datastax driver.
 I'm running the tests in a cluster with 3 nodes, RF=3 and CL=ALL for each
 operation.

 I have a Column family filled with some keys (for example 'a' and 'b').
 When this keys are deleted and inserted hereafter, sporadically this keys
 disappear.

 Is it a bug on Cassandra or on Datastax driver?
 Any suggestions?


 I would file a Cassandra JIRA with reproduction steps.

 http://issues.apache.org

 =Rob
 http://twitter.com/rcolidba



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jonathan Haddad

Keep in mind secondary indexes in cassandra are not there to improve
performance, or even really be used in a serious user facing manner.

Build and maintain your own view of the data, it'll be much faster.



On Thu, Sep 18, 2014 at 6:33 PM, Jay Patel pateljay3...@gmail.com wrote:
 Hi there,

 We are seeing extreme slow down (500ms to 1s) in query on secondary index
 with vnode. I'm seeing multiple secondary index scans on a given node in
 trace output when vnode is enabled. Without vnode, everything is good.

 Cluster size: 6 nodes
 Replication factor: 3
 Consistency level: local_quorum. Same behavior happens with consistency
 level of ONE.

 Snippet from the trace output. Pls see the attached output1.txt for the full
 log. Are we hitting any bug? Do not understand why coordinator sends
 requests multiple times to the same node (e.g. 192.168.51.22 in below
 output) for different token ranges.



 Executing indexed scan for [min(-9223372036854775808),
 max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 |
 Executing indexed scan for (max(-9193352069377957523),
 max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 |
  Executing indexed scan for (max(-9136021049555745100),
 max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 |
  Executing indexed scan for (max(-8959555493872108621),
 max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 |
 Executing indexed scan for (max(-8929774302283364912),
 max(-8854653908608918942)] | 23:11:31,001 | 192.168.51.22 |
  Executing indexed scan for (max(-8854653908608918942),
 max(-8762620856967633953)] | 23:11:31,002 | 192.168.51.25 |
   Executing indexed scan for (max(-8762620856967633953),
 max(-8668275030769104047)] | 23:11:31,003 | 192.168.51.22 |
 Executing indexed scan for (max(-8668275030769104047),
 max(-8659066486210615614)] | 23:11:31,003 | 192.168.51.25 |
  Executing indexed scan for (max(-8659066486210615614),
 max(-8419137646248370231)] | 23:11:31,004 | 192.168.51.22 |
  Executing indexed scan for (max(-8419137646248370231),
 max(-8416786876632807845)] | 23:11:31,005 | 192.168.51.25 |
  Executing indexed scan for (max(-8416786876632807845),
 max(-8315889933848495185)] | 23:11:31,006 | 192.168.51.22 |
 Executing indexed scan for (max(-8315889933848495185),
 max(-8270922890152952193)] | 23:11:31,006 | 192.168.51.25 |
 Executing indexed scan for (max(-8270922890152952193),
 max(-8260813759533312175)] | 23:11:31,007 | 192.168.51.22 |
  Executing indexed scan for (max(-8260813759533312175),
 max(-8234845345932129353)] | 23:11:31,008 | 192.168.51.25 |
  Executing indexed scan for (max(-8234845345932129353),
 max(-8216636461332030758)] | 23:11:31,008 | 192.168.51.22 |

 Thanks,
 Jay




-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Blocking while a node finishes joining the cluster after restart.

2014-09-19 Thread Jonathan Haddad

Depending on how you query (one or quorum) you might be able to do 1 rack at a 
time (or az or whatever you've got) assuming your snitch is set up right 


 On Sep 19, 2014, at 11:30 AM, Kevin Burton bur...@spinn3r.com wrote:
 
 This is great feedback…
 
 I think it could actually be even easier than this…
 
 You could have an ansible (or whatever cluster management system you’re 
 using) role for just seeds.
 
 Then you would serially restart all seeds one at a time.  You would need to 
 run ‘nodetool status’ and make sure the node is ‘U’ (up) I think.. but you 
 might want to make sure the majority of other nodes have agreed that this 
 node is up and available.
 
 I think you can ONLY do this serially.. .for a LARGE number of hosts, this 
 might take a while unless you can compute nodes which have mutually exclusive 
 key ranges.
 
 The serial approach would take a LONG time for large clusters.  If you have 
 sixty nodes, it could take an hour to do a rolling restart.
 
 Kevin
 
 On Tue, Sep 16, 2014 at 12:21 PM, James Briggs james.bri...@yahoo.com 
 wrote:
 FYI: OpsCenter has a default of sleep 60 seconds after each node restart,
 and an option of drain before stopping.
 
 I haven't noticed if they do anything special with seeds.
 (At least one seed needs to be running before you restart other nodes.)
 
 I wondered the same thing as Kevin and came to these conclusions.
 
 Fixing the startup script is non-trivial as far as startup scripts go.
 
 For start, it would have to:
 
 - parse cassandra.yaml for seeds
 - if itself is not a seed, wait for a seed to start first. (could take 
 minutes or never.)
 - continue start.
 
 For a no-downtime cluster restart script, it would have to:
 
 - verify cluster health (ie. quorum/CL is met or you lose writes)
 - parse cassandra.yaml for seeds and see if a seed is up
 - stop gossip and thrift
 - maybe do compaction before drain
 - drain node
 - stop/start or restart cassandra process.
 
 http://comments.gmane.org/gmane.comp.db.cassandra.user/20144
 
 Both of those scripts would be nice to have. :)
 
 OpsCenter is flaky at doing rolling restart in my test cluster,
 so an alternative is needed.
 
 Also, the free OpsCenter doesn't have rolling repair option enabled.
 
 ccm has the options to do drain, stop and start, but a bash
 script would be needed to make it rolling.
 
 https://github.com/pcmanus/ccm
 
 Thanks, James. 
 -- 
 Cassandra/MySQL DBA. Available in San Jose area or remote.
 
 From: Duncan Sands duncan.sa...@gmail.com
 To: user@cassandra.apache.org 
 Sent: Tuesday, September 16, 2014 11:09 AM
 Subject: Re: Blocking while a node finishes joining the cluster after 
 restart.
 
 Hi Kevin, if you are using the latest version of opscenter, then even the 
 community (= free) edition can do a rolling restart of your cluster.  It's 
 pretty convenient.
 
 Ciao, Duncan.
 
 On 16/09/14 19:44, Kevin Burton wrote:
  Say I want to do a rolling restart of Cassandra…
 
  I can’t just restart all of them because they need some time to gossip and 
  for
  that gossip to get to all nodes.
 
  What is the best strategy for this.
 
  It would be something like:
 
  /etc/init.d/cassandra restart  wait-for-cassandra.sh
 
  … or something along those lines.
 
  --
 
  Founder/CEO Spinn3r.com http://Spinn3r.com
 
  Location: *San Francisco, CA*
  blog:**http://burtonator.wordpress.com
  … or check out my Google+ profile
 
  https://plus.google.com/102718274791889610666/posts
  http://spinn3r.com
 
 
 
 
 
 -- 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile

Re: Difference in retrieving data from cassandra

2014-09-25 Thread Jonathan Haddad

You'll need to provide a bit of information.  To start, a query trace
from would be helpful.

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

(self promo) You may want to read over my blog post regarding
diagnosing problems in production.  I've covered diagnosing slow
queries: 
http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/


On Thu, Sep 25, 2014 at 4:21 AM, Umang Shah shahuma...@gmail.com wrote:
 Hi All,

 I am using cassandra with Pentaho PDI kettle, i have installed cassandra in
 Amazon EC2 instance and in local-machine, so when i am trying to retrieve
 data from local machine using Pentaho PDI it is taking few seconds (not more
 then 20 seconds) and if i do the same using production data-base it takes
 almost 3 minutes for the same number of data , which is huge difference.

 So if anybody can give me some comments of solution that what i need to
 check for this or how can i narrow down this difference?

 on local machine and production server RAM is same.
 Local machine is windows environment and production is Linux.

 --
 Regards,
 Umang V.Shah
 BI-ETL Developer



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Repair taking long time

2014-09-26 Thread Jonathan Haddad

Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
This problem is addressed in 2.1.

On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
gene.robich...@match.com wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in
 another.



 Running a repair on a large column family seems to be moving much slower
 than I expect.



 Looking at nodetool compaction stats it indicates the Validation phase is
 running that the total bytes is 4.5T (4505336278756).



 This is a very large CF. The process has been running for 2.5 hours and has
 processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
 rate it will take 158 hours, just shy of 1 week.



 Is this reasonable? This is my first large repair and I am wondering if this
 is normal for a CF of this size. Seems like a long time to me.



 Is it possible to tune this process to speed it up? Is there something in my
 configuration that could be causing this slow performance? I am running
 HDDs, not SSDs in a JBOD configuration.







 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273





-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Repair taking long time

2014-09-26 Thread Jonathan Haddad

If you're using DSE you might want to contact Datastax support, rather
than the ML.

On Fri, Sep 26, 2014 at 10:52 AM, Gene Robichaux
gene.robich...@match.com wrote:
 I am on DSE 4.0.3 which is 2.0.7.



 If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..



 The bad thing is that table is not our largest….. :(





 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273



 From: Brice Dutheil [mailto:brice.duth...@gmail.com]
 Sent: Friday, September 26, 2014 12:47 PM
 To: user@cassandra.apache.org
 Subject: Re: Repair taking long time



 Unfortunately DSE 4.5.0 is still on 2.0.x


 -- Brice



 On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
 This problem is addressed in 2.1.


 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
 gene.robich...@match.com wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4
 in
 another.



 Running a repair on a large column family seems to be moving much slower
 than I expect.



 Looking at nodetool compaction stats it indicates the Validation phase is
 running that the total bytes is 4.5T (4505336278756).



 This is a very large CF. The process has been running for 2.5 hours and
 has
 processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
 rate it will take 158 hours, just shy of 1 week.



 Is this reasonable? This is my first large repair and I am wondering if
 this
 is normal for a CF of this size. Seems like a long time to me.



 Is it possible to tune this process to speed it up? Is there something in
 my
 configuration that could be causing this slow performance? I am running
 HDDs, not SSDs in a JBOD configuration.







 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Repair taking long time

2014-09-26 Thread Jonathan Haddad

Well, in that case, you may want to roll your own script for doing
constant repairs of your cluster, and extend your gc grace seconds so
you can repair the whole cluster before the tombstones are cleared.

On Fri, Sep 26, 2014 at 11:15 AM, Gene Robichaux
gene.robich...@match.com wrote:
 Using their community edition..no support (yet!) :(

 Gene Robichaux
 Manager, Database Operations
 Match.com
 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
 Phone: 214-576-3273

 -Original Message-
 From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf 
 Of Jonathan Haddad
 Sent: Friday, September 26, 2014 12:58 PM
 To: user@cassandra.apache.org
 Subject: Re: Repair taking long time

 If you're using DSE you might want to contact Datastax support, rather than 
 the ML.

 On Fri, Sep 26, 2014 at 10:52 AM, Gene Robichaux gene.robich...@match.com 
 wrote:
 I am on DSE 4.0.3 which is 2.0.7.



 If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..



 The bad thing is that table is not our largest….. :(





 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273



 From: Brice Dutheil [mailto:brice.duth...@gmail.com]
 Sent: Friday, September 26, 2014 12:47 PM
 To: user@cassandra.apache.org
 Subject: Re: Repair taking long time



 Unfortunately DSE 4.5.0 is still on 2.0.x


 -- Brice



 On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
 This problem is addressed in 2.1.


 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
 gene.robich...@match.com wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC
 and 4 in another.



 Running a repair on a large column family seems to be moving much
 slower than I expect.



 Looking at nodetool compaction stats it indicates the Validation
 phase is running that the total bytes is 4.5T (4505336278756).



 This is a very large CF. The process has been running for 2.5 hours
 and has processed 71G (71950433062). That rate is about 28.4 GB per
 hour. At this rate it will take 158 hours, just shy of 1 week.



 Is this reasonable? This is my first large repair and I am wondering
 if this is normal for a CF of this size. Seems like a long time to
 me.



 Is it possible to tune this process to speed it up? Is there
 something in my configuration that could be causing this slow
 performance? I am running HDDs, not SSDs in a JBOD configuration.







 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Performance Issue: Keeping rows in memory

2014-10-22 Thread Jonathan Haddad

First, did you run a query trace?

I recommend Al Tobey's pcstat util to determine if your files are in
the buffer cache: https://github.com/tobert/pcstat



On Wed, Oct 22, 2014 at 4:34 AM, Thomas Whiteway
thomas.white...@metaswitch.com wrote:
 Hi,



 I’m working on an application using a Cassandra (2.1.0) cluster where

 -  our entire dataset is around 22GB

 -  each node has 48GB of memory but only a single (mechanical) hard
 disk

 -  in normal operation we have a low level of writes and no reads

 -  very occasionally we need to read rows very fast (1.5K
 rows/second), and only read each row once.



 When we try and read the rows it takes up to five minutes before Cassandra
 is able to keep up.  The problem seems to be that it takes a while to get
 the data into the page cache and until then Cassandra can’t retrieve the
 data from disk fast enough (e.g. if I drop the page cache mid-test then
 Cassandra slows down for the next 5 minutes).



 Given that the total amount of should fit comfortably in memory I’ve been
 trying to find a way to keep the rows cached in memory but there doesn’t
 seem to be a particularly great way to achieve this.



 I’ve tried enabling the row cache and pre-populating the test by querying
 every row before starting the load which gives good performance, but the row
 cache isn’t really intended to be used this way and we’d be fighting the row
 cache to keep the rows in (e.g. by cyclically reading through all the rows
 during normal operation).



 Keeping the page cache warm by running a background task to keep accessing
 the files for the sstables would be simpler and currently this is the
 solution we’re leaning towards, but we have less control over the page
 cache, it would be vulnerable to other processes knocking Cassandra’s files
 out, and it generally feels like a bit of a hack.



 Has anyone had any success with trying to do something similar to this or
 have any suggestions for possible solutions?



 Thanks,

 Thomas





-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

2014-10-22 Thread Jonathan Haddad

No.  Consider a scenario where you supply a timestamp a week in the future,
flush it to sstable, and then do a write, with the current timestamp.  The
record in disk will have a timestamp greater than the one in the memtable.

On Wed, Oct 22, 2014 at 9:18 AM, Donald Smith 
donald.sm...@audiencescience.com wrote:

  Question about the read path in cassandra.  If a partition/row is in the
 Memtable and is being actively written to by other clients,  will a READ of
 that partition also have to hit SStables on disk (or in the page
 cache)?  Or can it be serviced entirely from the Memtable?



 If you select all columns (e.g., “*select * from ….*”)   then I can
 imagine that cassandra would need to merge whatever columns are in the
 Memtable with what’s in SStables on disk.



 But if you select a single column (e.g., “*select Name from ….  where id=
 …*.”) and if that column is in the Memtable, I’d hope cassandra could
 skip checking the disk.  Can it do this optimization?



 Thanks, Don



 *Donald A. Smith* | Senior Software Engineer
 P: 425.201.3900 x 3866
 C: (206) 819-5965
 F: (646) 443-2333
 dona...@audiencescience.com


 [image: AudienceScience]






-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: OOM at Bootstrap Time

2014-10-26 Thread Jonathan Haddad

If the issue is related to I/O, you're going to want to determine if
you're saturated.  Take a look at `iostat -dmx 1`, you'll see avgqu-sz
(queue size) and svctm, (service time).The higher those numbers
are, the most overwhelmed your disk is.

On Sun, Oct 26, 2014 at 12:01 PM, DuyHai Doan doanduy...@gmail.com wrote:
 Hello Maxime

 Increasing the flush writers won't help if your disk I/O is not keeping up.

 I've had a look into the log file, below are some remarks:

 1) There are a lot of SSTables on disk for some tables (events for example,
 but not only). I've seen that some compactions are taking up to 32 SSTables
 (which corresponds to the default max value for SizeTiered compaction).

 2) There is a secondary index that I found suspicious : loc.loc_id_idx. As
 its name implies I have the impression that it's an index on the id of the
 loc which would lead to almost an 1-1 relationship between the indexed value
 and the original loc. Such index should be avoided because they do not
 perform well. If it's not an index on the loc_id, please disregard my remark

 3) There is a clear imbalance of SSTable count on some nodes. In the log, I
 saw:

 INFO  [STREAM-IN-/...20] 2014-10-25 02:21:43,360
 StreamResultFuture.java:166 - [Stream #a6e54ea0-5bed-11e4-8df5-f357715e1a79
 ID#0] Prepare completed. Receiving 163 files(4 111 187 195 bytes), sending 0
 files(0 bytes)

 INFO  [STREAM-IN-/...81] 2014-10-25 02:21:46,121
 StreamResultFuture.java:166 - [Stream #a6e54ea0-5bed-11e4-8df5-f357715e1a79
 ID#0] Prepare completed. Receiving 154 files(3 332 779 920 bytes), sending 0
 files(0 bytes)

 INFO  [STREAM-IN-/...71] 2014-10-25 02:21:50,494
 StreamResultFuture.java:166 - [Stream #a6e54ea0-5bed-11e4-8df5-f357715e1a79
 ID#0] Prepare completed. Receiving 1315 files(4 606 316 933 bytes), sending
 0 files(0 bytes)

 INFO  [STREAM-IN-/...217] 2014-10-25 02:21:51,036
 StreamResultFuture.java:166 - [Stream #a6e54ea0-5bed-11e4-8df5-f357715e1a79
 ID#0] Prepare completed. Receiving 1640 files(3 208 023 573 bytes), sending
 0 files(0 bytes)

  As you can see, the existing 4 nodes are streaming data to the new node and
 on average the data set size is about 3.3 - 4.5 Gb. However the number of
 SSTables is around 150 files for nodes ...20 and
 ...81 but goes through the roof to reach 1315 files for
 ...71 and 1640 files for ...217

  The total data set size is roughly the same but the file number is x10,
 which mean that you'll have a bunch of tiny files.

  I guess that upon reception of those files, there will be a massive flush
 to disk, explaining the behaviour you're facing (flush storm)

 I would suggest looking on nodes ...71 and ...217 to
 check for the total SSTable count for each table to confirm this intuition

 Regards


 On Sun, Oct 26, 2014 at 4:58 PM, Maxime maxim...@gmail.com wrote:

 I've emailed you a raw log file of an instance of this happening.

 I've been monitoring more closely the timing of events in tpstats and the
 logs and I believe this is what is happening:

 - For some reason, C* decides to provoke a flush storm (I say some reason,
 I'm sure there is one but I have had difficulty determining the behaviour
 changes between 1.* and more recent releases).
 - So we see ~ 3000 flush being enqueued.
 - This happens so suddenly that even boosting the number of flush writers
 to 20 does not suffice. I don't even see all time blocked numbers for it
 before C* stops responding. I suspect this is due to the sudden OOM and GC
 occurring.
 - The last tpstat that comes back before the node goes down indicates 20
 active and 3000 pending and the rest 0. It's by far the anomalous activity.

 Is there a way to throttle down this generation of Flush? C* complains if
 I set the queue_size to any value (deprecated now?) and boosting the threads
 does not seem to help since even at 20 we're an order of magnitude off.

 Suggestions? Comments?


 On Sun, Oct 26, 2014 at 2:26 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Hello Maxime

  Can you put the complete logs and config somewhere ? It would be
 interesting to know what is the cause of the OOM.

 On Sun, Oct 26, 2014 at 3:15 AM, Maxime maxim...@gmail.com wrote:

 Thanks a lot that is comforting. We are also small at the moment so I
 definitely can relate with the idea of keeping small and simple at a level
 where it just works.

 I see the new Apache version has a lot of fixes so I will try to upgrade
 before I look into downgrading.


 On Saturday, October 25, 2014, Laing, Michael
 michael.la...@nytimes.com wrote:

 Since no one else has stepped in...

 We have run clusters with ridiculously small nodes - I have a
 production cluster in AWS with 4GB nodes each with 1 CPU and disk-based
 instance storage. It works fine but you can see those little puppies
 struggle...

 And I ran into problems such as you observe...

 Upgrading Java to the

Re: read after write inconsistent even on a one node cluster

2014-11-06 Thread Jonathan Haddad

For cqlengine we do quite a bit of write then read to ensure data was
written correctly, across 1.2, 2.0, and 2.1.  For what it's worth,
I've never seen this issue come up.  On a single node, Cassandra only
acks the write after it's been written into the memtable.  So, you'd
expect to see the most recent data.

A possibility - if you're running in a VM, it's possible the clock
isn't incrementing in real time?  I've seen this happen with uuid1
generation - I was getting duplicates if I generated them fast enough.
Perhaps you're writing 2 values one right after the other and they're
getting the same millisecond precision timestamp.

On Thu, Nov 6, 2014 at 10:26 AM, Robert Coli rc...@eventbrite.com wrote:
 On Thu, Nov 6, 2014 at 6:14 AM, Brian Tarbox briantar...@gmail.com wrote:

 We write values to our keyspaces and then immediately read the values back
 (in our Cucumber tests).  About 20% of the time we get the old value.if
 we wait 1 second and redo the query (within the same java method) we get the
 new value.

 This is all happening on a single node...how is this possible?


 It sounds unreasonable/unexpected to me, if you have a trivial repro case, I
 would file a JIRA.

 =Rob




-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: query tracing

2014-11-07 Thread Jonathan Haddad

Personally I've found that using query timing + log aggregation on the
client side is more effective than trying to mess with tracing probability
in order to find a single query which has recently become a problem.  I
recommend wrapping your session with something that can automatically log
the statement on a slow query, then use tracing to identify exactly what
happened.  This way finding your problem is not a matter of chance.



On Fri Nov 07 2014 at 9:41:38 AM Chris Lohfink clohfin...@gmail.com wrote:

 It saves a lot of information for each request thats traced so there is
 significant overhead.  If you start at a low probability and move it up
 based on the load impact it will provide a lot of insight and you can
 control the cost.

 ---
 Chris Lohfink

 On Fri, Nov 7, 2014 at 11:35 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 is there any significant  performance penalty if one turn on Cassandra
 query tracing, through DataStax java driver (say, per every query request
 of some trouble query)?

 More sampling seems better but then doing so may also slow down the
 system in some other ways?

 thanks

Re: PHP - Cassandra integration

2014-11-11 Thread Jonathan Haddad

In production?

On Mon Nov 10 2014 at 6:06:41 AM Spencer Brown lilspe...@gmail.com wrote:

 I'm using /McFrazier/PhpBinaryCql/


 On Mon, Nov 10, 2014 at 1:48 AM, Akshay Ballarpure 
 akshay.ballarp...@tcs.com wrote:

 Hello,
 I am working on PHP cassandra integration, please let me know which
 library is good from scalability and performance perspective ?

 Best Regards
 Akshay Ballarpure
 Tata Consultancy Services
 Cell:- 9985084075
 Mailto: akshay.ballarp...@tcs.com
 Website: http://www.tcs.com
 
 Experience certainty.IT Services
Business Solutions
Consulting
 

 =-=-=
 Notice: The information contained in this e-mail
 message and/or attachments to it may contain
 confidential or privileged information. If you are
 not the intended recipient, any dissemination, use,
 review, distribution, printing or copying of the
 information contained in this e-mail message
 and/or attachments to it are strictly prohibited. If
 you have received this communication in error,
 please notify us by reply e-mail or telephone and
 immediately and permanently delete the message
 and any attachments. Thank you

Re: Cassandra sort using updatable query

2014-11-12 Thread Jonathan Haddad

With Cassandra you're going to want to model tables to meet the
requirements of your queries instead of like a relational database where
you build tables in 3NF then optimize after.

For your optimized select query, your table (with caveat, see below) could
start out as:

create table words (
  year int,
  frequency int,
  content text,
  primary key (year, frequency, content) );

You may want to maintain other tables as well for different types of select
statements.

Your UPDATE statement above won't work, you'll have to DELETE and INSERT,
since you can't change the value of a clustering column.  If you don't know
what your old frequency is ahead of time (to do the delete), you'll need to
keep another table mapping content,year - frequency.

Now, the tricky part here is that the above model will limit the total
number of partitions you've got to the number of years you're working with,
and will not scale as the cluster increases in size.  Ideally you could
bucket frequencies.  If that feels like too much work (it's starting to for
me), this may be better suited to something like solr, elastic search, or
DSE (cassandra + solr).

Does that help?

Jon






On Wed Nov 12 2014 at 9:01:44 AM Chamila Wijayarathna 
cdwijayarat...@gmail.com wrote:

 Hello all,

 I have a data set with attributes content and year. I want to put them in
 to CF 'words' with attributes ('content','year','frequency'). The CF should
 support following operations.

- Frequency attribute of a column can be updated (i.e. - : can run
query like UPDATE words SET frequency = 2 WHERE content='abc' AND
year=1990;), where clause should contain content and year
- Should support select query like Select content from words where
year = 2010 ORDER BY frequency DESC LIMIT 10; (where clause only has year)
where results can be ordered using frequency

 Is this kind of requirement can be fulfilled using Cassandra? What is the
 CF structure and indexing I need to use here? What queries should I use to
 create CF and in indexing?


 Thank You!



 --
 *Chamila Dilshan Wijayarathna,*
 SMIEEE, SMIESL,
 Undergraduate,
 Department of Computer Science and Engineering,
 University of Moratuwa.

Re: Is it more performant to split data with the same schema into multiple keyspaces, as supposed to put all of them into the same keyspace?

2014-11-13 Thread Jonathan Haddad

Performance will be the same.  There's no performance benefit to using
multiple keyspaces.

On Thu Nov 13 2014 at 8:42:40 AM Li, George guangxing...@pearson.com
wrote:

 Hi,
 we use Cassandra to store some association type of data. For example,
 store user to course (course registrations) association and user to school
 (school enrollment) association data. The schema for these two types of
 associations are the same. So there are two options to store the data:
 1. Put user to course association data into one keyspace, and user to
 school association data into another keyspace.
 2. Put both of them into the same keyspace.
 In the long run, such data will grow to be very large. With that in mind,
 is it better to use the first approach (having multiple keyspaces) for
 better performance?
 Thanks.

 George

Re: Is it more performant to split data with the same schema into multiple keyspaces, as supposed to put all of them into the same keyspace?

2014-11-13 Thread Jonathan Haddad

Tables, yes, but that wasn't the question.  The question was around using
different keyspaces.

On Thu Nov 13 2014 at 9:17:30 AM Tyler Hobbs ty...@datastax.com wrote:

 That's not necessarily true.  You don't need to split them into separate
 keyspaces, but separate tables may have some advantages.  For example, in
 Cassandra 2.1, compaction and index summary management are optimized based
 on read rates for SSTables.  If you have different read rates or patterns
 for the two types of data, it will confuse/eliminate these optimizations.

 If you have two separate sets of data with (potentially) two separate read
 patterns, don't put them in the same table.

 On Thu, Nov 13, 2014 at 11:08 AM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 Performance will be the same.  There's no performance benefit to using
 multiple keyspaces.


 On Thu Nov 13 2014 at 8:42:40 AM Li, George guangxing...@pearson.com
 wrote:

 Hi,
 we use Cassandra to store some association type of data. For example,
 store user to course (course registrations) association and user to school
 (school enrollment) association data. The schema for these two types of
 associations are the same. So there are two options to store the data:
 1. Put user to course association data into one keyspace, and user to
 school association data into another keyspace.
 2. Put both of them into the same keyspace.
 In the long run, such data will grow to be very large. With that in
 mind, is it better to use the first approach (having multiple keyspaces)
 for better performance?
 Thanks.

 George




 --
 Tyler Hobbs
 DataStax http://datastax.com/

Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Jonathan Haddad

If he deletes all the data with RF=1, won't he have data loss?

On Mon Nov 17 2014 at 5:14:23 PM Michael Shuler mich...@pbandjelly.org
wrote:

 On 11/17/2014 02:04 PM, Alain Vandendorpe wrote:
  Hey all,
 
  For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup.
  This is being moved away from ASAP. In the meantime, adding a node
  recently encountered a Stream Failed error (http://pastie.org/9725846).
  Cassandra restarted and it seemingly restarted streaming from zero,
  without having removed the failed stream's data.
 
  With bootstrapping and initial compactions finished that node now has
  what seems to be duplicate data, with almost exactly 2x the expected
  disk usage. CQL returns correct results but we depend on the ability to
  directly read the SSTable files (hence also RF=1.)
 
  Would anyone have suggestions on a good way to resolve this?

 Start over fresh, deleting *all* the data, and bootstrap the node again?

 --
 Michael

Re: Using Cassandra for session tokens

2014-12-01 Thread Jonathan Haddad

I don't think DateTiered will help here, since there's no clustering key
defined.  This is a pretty straightforward workload, I've done something
similar.

Are you overwriting the session on every request? Or just writing it once?
On Mon Dec 01 2014 at 6:45:14 AM Matt Brown m...@mattnworb.com wrote:

 This sounds like a good use case for
 http://www.datastax.com/dev/blog/datetieredcompactionstrategy


 On Dec 1, 2014, at 3:07 AM, Phil Wise p...@advancedtelematic.com wrote:

 We're considering switching from using Redis to Cassandra to store
 short lived (~1 hour) session tokens, in order to reduce the number of
 data storage engines we have to manage.

 Can anyone foresee any problems with the following approach:

 1) Use the TTL functionality in Cassandra to remove old tokens.

 2) Store the tokens in a table like:

 CREATE TABLE tokens (
 id uuid,
 username text,
 // (other session information)
 PRIMARY KEY (id)
 );

 3) Perform ~100 writes/sec like:

 INSERT INTO tokens (id, username )
 VALUES (468e0d69-1ebe-4477-8565-00a4cb6fa9f2, 'bob')
 USING TTL 3600;

 4) Perform ~1000 reads/sec like:

 SELECT * FROM tokens
 WHERE ID=468e0d69-1ebe-4477-8565-00a4cb6fa9f2 ;

 The tokens will be about 100 bytes each, and we will grant 100 per
 second on a small 3 node cluster. Therefore there will be about 360k
 tokens alive at any time, with a total size of 36MB before database
 overhead.

 My biggest worry at the moment is that this kind of workload will
 stress compaction in an unusual way.  Are there any metrics I should
 keep an eye on to make sure it is working fine?

 I read over the following links, but they mostly talk about DELETE-ing
 and tombstones. Am I right in thinking that as soon as a node performs
 a compaction then the rows with an expired TTL will be thrown away,
 regardless of gc_grace_seconds?

 https://issues.apache.org/jira/browse/CASSANDRA-7534


 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets

 https://issues.apache.org/jira/browse/CASSANDRA-6654

 Thank you

 Phil

Re: Using Cassandra for session tokens

2014-12-01 Thread Jonathan Haddad

I don't know what the advantage would be of using this sharding system.  I
would recommend just going with a simple k-v table as the OP suggested.
On Mon Dec 01 2014 at 7:18:51 AM Laing, Michael michael.la...@nytimes.com
wrote:

 Since the session tokens are random, perhaps computing a shard from each
 one and using it as the partition key would be a good idea.

 I would also use uuid v1 to get ordering.

 With such a small amount of data, only a few shards would be needed.

 On Mon, Dec 1, 2014 at 10:08 AM, Phil Wise p...@advancedtelematic.com
 wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 The session will be written once at create time, and never modified
 after that. Will that affect things?

 Thank you

 - -Phil

 On 01.12.2014 15:58, Jonathan Haddad wrote:
  I don't think DateTiered will help here, since there's no
  clustering key defined.  This is a pretty straightforward workload,
  I've done something similar.
 
  Are you overwriting the session on every request? Or just writing
  it once?
 
  On Mon Dec 01 2014 at 6:45:14 AM Matt Brown m...@mattnworb.com
  wrote:
 
  This sounds like a good use case for
  http://www.datastax.com/dev/blog/datetieredcompactionstrategy
 
 
  On Dec 1, 2014, at 3:07 AM, Phil Wise
  p...@advancedtelematic.com wrote:
 
  We're considering switching from using Redis to Cassandra to
  store short lived (~1 hour) session tokens, in order to reduce
  the number of data storage engines we have to manage.
 
  Can anyone foresee any problems with the following approach:
 
  1) Use the TTL functionality in Cassandra to remove old tokens.
 
  2) Store the tokens in a table like:
 
  CREATE TABLE tokens ( id uuid, username text, // (other session
  information) PRIMARY KEY (id) );
 
  3) Perform ~100 writes/sec like:
 
  INSERT INTO tokens (id, username ) VALUES
  (468e0d69-1ebe-4477-8565-00a4cb6fa9f2, 'bob') USING TTL 3600;
 
  4) Perform ~1000 reads/sec like:
 
  SELECT * FROM tokens WHERE
  ID=468e0d69-1ebe-4477-8565-00a4cb6fa9f2 ;
 
  The tokens will be about 100 bytes each, and we will grant 100
  per second on a small 3 node cluster. Therefore there will be
  about 360k tokens alive at any time, with a total size of 36MB
  before database overhead.
 
  My biggest worry at the moment is that this kind of workload
  will stress compaction in an unusual way.  Are there any metrics
  I should keep an eye on to make sure it is working fine?
 
  I read over the following links, but they mostly talk about
  DELETE-ing and tombstones. Am I right in thinking that as soon as
  a node performs a compaction then the rows with an expired TTL
  will be thrown away, regardless of gc_grace_seconds?
 
  https://issues.apache.org/jira/browse/CASSANDRA-7534
 
 
 
 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
 
 
 
 https://issues.apache.org/jira/browse/CASSANDRA-6654
 
  Thank you
 
  Phil
 
 
 
 
 
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1

 iQIcBAEBAgAGBQJUfIR1AAoJEAvGtrO88FBAnpAP/0RCdwCy4Wi0ogz24SRKpCu0
 c/i6O2HBTinl2RXLoH9xMOT8kXJ82P9tVDeKjLQAZYnBgRwF7Fcbvd40GPf+5aaj
 aU1TkU4jLnDCeFTwG/vx+TIfZEE27nppsECLtfmnzJEl/4yZwAG3Dy+VkuqBurMu
 J6If9bMnseEgvF1onmA7ZLygJq44tlgOGyHT0WdYRX7CwAE6HeyxMC38ArarRU37
 dfGhsttBMqdxHreKE0CqRZZ67iT+KixGoUeCvZUnTvOLTsrEWO17yTezQDamAee0
 jIsVfgKqqhoiKeAj99J75rcsIT3WAbS23MV1s92AQXYkpR1KmHTB6KvUjH2AQBew
 9xwdDSg/eVsdQNkGbtSJ2cNPnFuBBZv2kzW5PVyQ625bMHNAF2GE9rLIKddMUbNQ
 LiwOPAJDWBJeZnJYj3cncdfC2Jw1H4rlV0k6BHwdzZUrEdbvUKlHtyl8/ZsZnJHs
 SrPsiYQa0NI6C+faAFqzBEyLhsWdJL3ygNZTo4CW3I8z+yYEyzZtmKPDmHdVzK/M
 M8GlaRYw1t7OY81VBXKcmPyr5Omti7wtkffC6bhopsPCm7ATSq2r46z8OFlkUdJl
 wcTMJM0E6gZtiMIr3D+WbOTzI5kPX6x4UB3ec3xq6+GIObPwioVAJf3ADmIK4iHT
 G106NwdUnag5XlnbwgMX
 =6zXb
 -END PGP SIGNATURE-

Re: full gc too often

2014-12-04 Thread Jonathan Haddad

I recommend reading through
https://issues.apache.org/jira/browse/CASSANDRA-8150 to get an idea of how
the JVM GC works and what you can do to tune it.  Also good is Blake
Eggleston's writeup which can be found here:
http://blakeeggleston.com/cassandra-tuning-the-jvm-for-read-heavy-workloads.html

I'd like to note that allocating 4GB heap to Cassandra under any serious
workload is unlikely to be sufficient.


On Thu Dec 04 2014 at 8:43:38 PM Philo Yang ud1...@gmail.com wrote:

 I have two kinds of machine:
 16G RAM, with default heap size setting, about 4G.
 64G RAM, with default heap size setting, about 8G.

 These two kinds of nodes have same number of vnodes, and both of them have
 gc issue, although the node of 16G have a higher probability  of gc issue.

 Thanks,
 Philo Yang


 2014-12-05 12:34 GMT+08:00 Tim Heckman t...@pagerduty.com:

 On Dec 4, 2014 8:14 PM, Philo Yang ud1...@gmail.com wrote:
 
  Hi,all
 
  I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with
 full gc that sometime there may be one or two nodes full gc more than one
 time per minute and over 10 seconds each time, then the node will be
 unreachable and the latency of cluster will be increased.
 
  I grep the GCInspector's log, I found when the node is running fine
 without gc trouble there are two kinds of gc:
  ParNew GC in less than 300ms which clear the Par Eden Space and
 enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in
 more than 200ms, there is only a small number of ParNew GC in log)
  ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and
 enlarge Par Eden Space little, each 1-2 hours it will be executed once.
 
  However, sometimes ConcurrentMarkSweep will be strange like it shows:
 
  INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 - 3579838464;
 Par Eden Space: 503316480 - 294794576; Par Survivor Space: 62914528 - 0
  INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 - 3579836512;
 Par Eden Space: 503316480 - 310562032; Par Survivor Space: 62872496 - 0
  INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 - 3579805792;
 Par Eden Space: 503316480 - 332391096; Par Survivor Space: 62914544 - 0
  INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 - 3579829760;
 Par Eden Space: 503316480 - 351991456; Par Survivor Space: 62914552 - 0
  INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 - 3579799752;
 Par Eden Space: 503316480 - 366222584; Par Survivor Space: 62914560 - 0
  INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 - 3579817392;
 Par Eden Space: 503316480 - 388702928; Par Survivor Space: 62914552 - 0
  INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 - 3579838424;
 Par Eden Space: 503316480 - 408992784; Par Survivor Space: 62896720 - 0
  INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 - 3579816424;
 Par Eden Space: 503316480 - 438633608; Par Survivor Space: 62914544 - 0
  INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 - 3579785496;
 Par Eden Space: 503316480 - 441354856; Par Survivor Space: 62889528 - 0
  INFO  [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12082ms.  CMS Old Gen: 3579786592 - 3579814464;
 Par Eden Space: 503316480 - 448782440; Par Survivor Space: 62914560 - 0
 
  In each time Old Gen reduce only a little, Survivor Space will be clear
 but the heap is still full so there will be another full gc very soon then
 the node will down. If I restart the node, it will be fine without gc
 trouble.
 
  Can anyone help me to find out where is the problem that full gc can't
 reduce CMS Old Gen? Is it because there are too many objects in heap can't
 be recycled? I think review the table scheme designing and add new nodes
 into cluster is a good idea, but I still want to know if there is any other
 reason causing this trouble.

 How much total system memory do you have? How much is allocated for heap
 usage? How big is your working data set?

 The reason I ask is that I've seen problems with lots of GC with no room
 gained, and it was memory pressure. Not enough for the heap. We decided
 that just increasing the heap size was a bad idea, as we did rely on free
 RAM being used for filesystem caching. So some vertical and horizontal
 scaling allowed us to give Cass more heap space, as well as distribute

Re: Could ring cache really improve performance in Cassandra?

2014-12-07 Thread Jonathan Haddad

What's a ring cache?

FYI if you're using the DataStax CQL drivers they will automatically route
requests to the correct node.

On Sun Dec 07 2014 at 12:59:36 AM kong kongjiali...@gmail.com wrote:

 Hi,

 I'm doing stress test on Cassandra. And I learn that using ring cache can
 improve the performance because the client requests can directly go to the
 target Cassandra server and the coordinator Cassandra node is the desired
 target node. In this way, there is no need for coordinator node to route
 the client requests to the target node, and maybe we can get the linear
 performance increment.



 However, in my stress test on an Amazon EC2 cluster, the test results are
 weird. Seems that there's no performance improvement after using ring
 cache. Could anyone help me explain this results? (Also, I think the
 results of test without ring cache is weird, because there's no linear
 increment on QPS when new nodes are added. I need help on explaining this,
 too). The results are as follows:



 INSERT(write):

 Node count

 Replication factor

 QPS(No ring cache)

 QPS(ring cache)

 1

 1

 18687

 20195

 2

 1

 20793

 26403

 2

 2

 22498

 21263

 4

 1

 28348

 30010

 4

 3

 28631

 24413



 SELECT(read):

 Node count

 Replication factor

 QPS(No ring cache)

 QPS(ring cache)

 1

 1

 24498

 22802

 2

 1

 28219

 27030

 2

 2

 35383

 36674

 4

 1

 34648

 28347

 4

 3

 52932

 52590





 Thank you very much,

 Joy

Re: full gc too often

2014-12-07 Thread Jonathan Haddad

There's a lot of factors that go into tuning, and I don't know of any
reliable formula that you can use to figure out what's going to work
optimally for your hardware.  Personally I recommend:

1) find the bottleneck
2) playing with a parameter (or two)
3) see what changed, performance wise

If you've got a specific question I think someone can find a way to help,
but asking what can 8gb of heap give me is pretty abstract and
unanswerable.

Jon

On Sun Dec 07 2014 at 8:03:53 AM Philo Yang ud1...@gmail.com wrote:

 2014-12-05 15:40 GMT+08:00 Jonathan Haddad j...@jonhaddad.com:

 I recommend reading through https://issues.apache.
 org/jira/browse/CASSANDRA-8150 to get an idea of how the JVM GC works
 and what you can do to tune it.  Also good is Blake Eggleston's writeup
 which can be found here: http://blakeeggleston.com/
 cassandra-tuning-the-jvm-for-read-heavy-workloads.html

 I'd like to note that allocating 4GB heap to Cassandra under any serious
 workload is unlikely to be sufficient.


 Thanks for your recommendation. After reading I try to allocate a larger
 heap and it is useful for me. 4G heap can't handle the workload in my use
 case indeed.

 So another question is, how much pressure dose default max heap (8G) can
 handle? The pressure may not be a simple qps, you know, slice query for
 many columns in a row will allocate more objects in heap than the query for
 a single column. Is there any testing result for the relationship between
 the pressure and the safety heap size? We know query a slice with many
 tombstones is not a good use case, but query a slice without tombstones may
 be a common use case, right?




 On Thu Dec 04 2014 at 8:43:38 PM Philo Yang ud1...@gmail.com wrote:

 I have two kinds of machine:
 16G RAM, with default heap size setting, about 4G.
 64G RAM, with default heap size setting, about 8G.

 These two kinds of nodes have same number of vnodes, and both of them
 have gc issue, although the node of 16G have a higher probability  of gc
 issue.

 Thanks,
 Philo Yang


 2014-12-05 12:34 GMT+08:00 Tim Heckman t...@pagerduty.com:

 On Dec 4, 2014 8:14 PM, Philo Yang ud1...@gmail.com wrote:
 
  Hi,all
 
  I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with
 full gc that sometime there may be one or two nodes full gc more than one
 time per minute and over 10 seconds each time, then the node will be
 unreachable and the latency of cluster will be increased.
 
  I grep the GCInspector's log, I found when the node is running fine
 without gc trouble there are two kinds of gc:
  ParNew GC in less than 300ms which clear the Par Eden Space and
 enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in
 more than 200ms, there is only a small number of ParNew GC in log)
  ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and
 enlarge Par Eden Space little, each 1-2 hours it will be executed once.
 
  However, sometimes ConcurrentMarkSweep will be strange like it shows:
 
  INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 -
 3579838464; Par Eden Space: 503316480 - 294794576; Par Survivor
 Space: 62914528 - 0
  INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 -
 3579836512; Par Eden Space: 503316480 - 310562032; Par Survivor
 Space: 62872496 - 0
  INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 -
 3579805792; Par Eden Space: 503316480 - 332391096; Par Survivor
 Space: 62914544 - 0
  INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 -
 3579829760; Par Eden Space: 503316480 - 351991456; Par Survivor
 Space: 62914552 - 0
  INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 -
 3579799752; Par Eden Space: 503316480 - 366222584; Par Survivor
 Space: 62914560 - 0
  INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 -
 3579817392; Par Eden Space: 503316480 - 388702928; Par Survivor
 Space: 62914552 - 0
  INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 -
 3579838424; Par Eden Space: 503316480 - 408992784; Par Survivor
 Space: 62896720 - 0
  INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 -
 3579816424; Par Eden Space: 503316480 - 438633608; Par Survivor
 Space: 62914544 - 0
  INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 -
 3579785496; Par Eden Space: 503316480 - 441354856; Par Survivor
 Space: 62889528 - 0
  INFO  [Service

Re: How to model data to achieve specific data locality

2014-12-07 Thread Jonathan Haddad

I think he mentioned 100MB as the max size - planning for 1mb might make
your data model difficult to work.

On Sun Dec 07 2014 at 12:07:47 PM Kai Wang dep...@gmail.com wrote:

 Thanks for the help. I wasn't clear how clustering column works. Coming
 from Thrift experience, it took me a while to understand how clustering
 column impacts partition storage on disk. Now I believe using seq_type as
 the first clustering column solves my problem. As of partition size, I will
 start with some bucket assumption. If the partition size exceeds the
 threshold I may need to re-bucket using smaller bucket size.

 On another thread Eric mentions the optimal partition size should be at
 100 kb ~ 1 MB. I will use that as the start point to design my bucket
 strategy.


 On Sun, Dec 7, 2014 at 10:32 AM, Jack Krupansky j...@basetechnology.com
 wrote:

   It would be helpful to look at some specific examples of sequences,
 showing how they grow. I suspect that the term “sequence” is being
 overloaded in some subtly misleading way here.

 Besides, we’ve already answered the headline question – data locality is
 achieved by having a common partition key. So, we need some clarity as to
 what question we are really focusing on

 And, of course, we should be asking the “Cassandra Data Modeling 101”
 question of what do your queries want to look like, how exactly do you want
 to access your data. Only after we have a handle on how you need to read
 your data can we decide how it should be stored.

 My immediate question to get things back on track: When you say “The
 typical read is to load a subset of sequences with the same seq_id”,
 what type of “subset” are you talking about? Again, a few explicit and
 concise example queries (in some concise, easy to read pseudo language or
 even plain English, but not belabored with full CQL syntax.) would be very
 helpful. I mean, Cassandra has no “subset” concept, nor a “load subset”
 command, so what are we really talking about?

 Also, I presume we are talking CQL, but some of the references seem more
 Thrift/slice oriented.

 -- Jack Krupansky

  *From:* Eric Stevens migh...@gmail.com
 *Sent:* Sunday, December 7, 2014 10:12 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: How to model data to achieve specific data locality

  Also new seq_types can be added and old seq_types can be deleted. This
 means I often need to ALTER TABLE to add and drop columns.

 Kai, unless I'm misunderstanding something, I don't see why you need to
 alter the table to add a new seq type.  From a data model perspective,
 these are just new values in a row.

 If you do have columns which are specific to particular seq_types, data
 modeling does become a little more challenging.  In that case you may get
 some advantage from using collections (especially map) to store data which
 applies to only a few seq types.  Or defining a schema which includes the
 set of all possible columns (that's when you're getting into ALTERs when a
 new column comes or goes).

  All sequences with the same seq_id tend to grow at the same rate.

 Note that it is an anti pattern in Cassandra to append to the same row
 indefinitely.  I think you understand this because of your original
 question.  But please note that a sub partitioning strategy which reuses
 subpartitions will result in degraded read performance after a while.
 You'll need to rotate sub partitions by something that doesn't repeat in
 order to keep the data for a given partition key grouped into just a few
 sstables.  A typical pattern there is to use some kind of time bucket
 (hour, day, week, etc., depending on your write volume).

 I do note that your original question was about preserving data locality
 - and having a consistent locality for a given seq_id - for best offline
 analytics.  If you wanted to work for this, you can certainly also include
 a blob value in your partitioning key, whose value is calculated to force a
 ring collision with this record's sibling data.  With Cassandra's default
 partitioner of murmur3, that's probably pretty challenging - murmur3 isn't
 designed to be cryptographically strong (it doesn't work to make it
 difficult to force a collision), but it's meant to have good distribution
 (it may still be computationally expensive to force a collision - I'm not
 that familiar with its internal workings).  In this case,
 ByteOrderedPartitioner would be a lot easier to force a ring collision on,
 but then you need to work on a good ring balancing strategy to distribute
 your data evenly over the ring.

 On Sun Dec 07 2014 at 2:56:26 AM DuyHai Doan doanduy...@gmail.com
 wrote:

 Those sequences are not fixed. All sequences with the same seq_id tend
 to grow at the same rate. If it's one partition per seq_id, the size will
 most likely exceed the threshold quickly

  -- Then use bucketing to avoid too wide partitions

 Also new seq_types can be added and old seq_types can be deleted. This
 means I often need to ALTER TABLE to add and

Re: Could ring cache really improve performance in Cassandra?

2014-12-07 Thread Jonathan Haddad

I would really not recommend using thrift for anything at this point,
including your load tests.  Take a look at CQL, all development is going
there and has in 2.1 seen a massive performance boost over 2.0.

You may want to try the Cassandra stress tool included in 2.1, it can
stress a table you've already built.  That way you can rule out any bugs on
the client side.  If you're going to keep using your tool, however, it
would be helpful if you sent out a link to the repo, since currently we
have no way of knowing if you've got a client side bug (data model or code)
that's limiting your performance.


On Sun Dec 07 2014 at 7:55:16 PM 孔嘉林 kongjiali...@gmail.com wrote:

 I find under the src/client folder of Cassandra 2.1.0 source code, there
 is a *RingCache.java* file. It uses a thrift client calling the*
 describe_ring()* API to get the token range of each Cassandra node. It is
 used on the client side. The client can use it combined with the
 partitioner to get the target node. In this way there is no need to route
 requests between Cassandra nodes, and the client can directly connect to
 the target node. So maybe it can save some routing time and improve
 performance.
 Thank you very much.

 2014-12-08 1:28 GMT+08:00 Jonathan Haddad j...@jonhaddad.com:

 What's a ring cache?

 FYI if you're using the DataStax CQL drivers they will automatically
 route requests to the correct node.

 On Sun Dec 07 2014 at 12:59:36 AM kong kongjiali...@gmail.com wrote:

 Hi,

 I'm doing stress test on Cassandra. And I learn that using ring cache
 can improve the performance because the client requests can directly go to
 the target Cassandra server and the coordinator Cassandra node is the
 desired target node. In this way, there is no need for coordinator node to
 route the client requests to the target node, and maybe we can get the
 linear performance increment.



 However, in my stress test on an Amazon EC2 cluster, the test results
 are weird. Seems that there's no performance improvement after using ring
 cache. Could anyone help me explain this results? (Also, I think the
 results of test without ring cache is weird, because there's no linear
 increment on QPS when new nodes are added. I need help on explaining this,
 too). The results are as follows:



 INSERT(write):

 Node count

 Replication factor

 QPS(No ring cache)

 QPS(ring cache)

 1

 1

 18687

 20195

 2

 1

 20793

 26403

 2

 2

 22498

 21263

 4

 1

 28348

 30010

 4

 3

 28631

 24413



 SELECT(read):

 Node count

 Replication factor

 QPS(No ring cache)

 QPS(ring cache)

 1

 1

 24498

 22802

 2

 1

 28219

 27030

 2

 2

 35383

 36674

 4

 1

 34648

 28347

 4

 3

 52932

 52590





 Thank you very much,

 Joy

Re: Can not connect with cqlsh to something different than localhost

2014-12-08 Thread Jonathan Haddad

Listen address needs the actual address, not the interface.  This is best
accomplished by setting up proper hostnames for each machine (through DNS
or hosts file) and leaving listen_address blank, as it will pick the
external ip.  Otherwise, you'll need to set the listen address to the IP of
the machine you want on each machine.  I find the former to be less of a
pain to manage.

On Mon Dec 08 2014 at 2:49:55 AM Richard Snowden 
richard.t.snow...@gmail.com wrote:

 This did not work either. I changed /etc/cassandra.yaml and restarted 
 Cassandra (I even restarted the machine to make 100% sure).

 What I tried:

 1) listen_address: localhost
- connection OK (but of course I can't connect from outside the VM to 
 localhost)

 2) Set listen_interface: eth0
- connection refused

 3) Set listen_address: 192.168.111.136
- connection refused


 What to do?


  Try:
  $ netstat -lnt
  and see which interface port 9042 is listening on. You will likely need to
  update cassandra.yaml to change the interface. By default, Cassandra is
  listening on localhost so your local cqlsh session works.

  On Sun, 7 Dec 2014 23:44 Richard Snowden richard.t.snow...@gmail.com
  wrote:

   I am running Cassandra 2.1.2 in an Ubuntu VM.
  
   cqlsh or cqlsh localhost works fine.
  
   But I can not connect from outside the VM (firewall, etc. disabled).
  
   Even when I do cqlsh 192.168.111.136 in my VM I get connection refused.
   This is strange because when I check my network config I can see that
   192.168.111.136 is my IP:
  
   root@ubuntu:~# ifconfig
  
   eth0  Link encap:Ethernet  HWaddr 00:0c:29:02:e0:de
 inet addr:192.168.111.136  Bcast:192.168.111.255
   Mask:255.255.255.0
 inet6 addr: fe80::20c:29ff:fe02:e0de/64 Scope:Link
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 RX packets:16042 errors:0 dropped:0 overruns:0 frame:0
 TX packets:8638 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000
 RX bytes:21307125 (21.3 MB)  TX bytes:709471 (709.4 KB)
  
   loLink encap:Local Loopback
 inet addr:127.0.0.1  Mask:255.0.0.0
 inet6 addr: ::1/128 Scope:Host
 UP LOOPBACK RUNNING  MTU:65536  Metric:1
 RX packets:550 errors:0 dropped:0 overruns:0 frame:0
 TX packets:550 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:0
 RX bytes:148053 (148.0 KB)  TX bytes:148053 (148.0 KB)
  
  
   root@ubuntu:~# cqlsh 192.168.111.136 9042
   Connection error: ('Unable to connect to any servers', {'192.168.111.136':
   error(111, Tried connecting to [('192.168.111.136', 9042)]. Last error:
   Connection refused)})
  
  
   What to do?

Re: Could ring cache really improve performance in Cassandra?

2014-12-08 Thread Jonathan Haddad

I agree with Robert.  If you're trying to test Cassandra, test Cassandra
using stress.  Set a reasonable benchmark, and then you'll be able to aim
for that with your client code.  Otherwise you're likely to be asking a lot
of the wrong questions  make incorrect assumptions.


On Mon Dec 08 2014 at 12:42:32 AM Robert Stupp sn...@snazy.de wrote:

 cassandra-stress is a great tool to check whether the sizing of your
 cluster in combination of your data model will fit your production needs.
 I.e. without the application :) Removing the application removes any
 possible bugs from the load test. Sure, it’s a necessary step to do it with
 your application - but I’d recommend to start with the stress test tool
 first.

 Thrift is a deprecated API. I strongly recommend to use the C++ driver (I
 pretty sure it supports the native protocol). The native protocol achieves
 approx. twice the performance than thrift via much fewer TCP connections.
 (Thrift is RPC - means connections usually waste system, application and
 server resources while waiting for something. Native protocol is a
 multiplexed protocol.) As John already said, all development effort is
 spent on CQL3 and native protocol - thift is just supported.

 With CQL you can you everything that you can do with thrift + more, new
 stuff.

 I also recommend to use prepared statements (it automagically works in a
 distributed cluster with the native protocol) - it eliminates the effort to
 parse CQL statement again and again.


 Am 08.12.2014 um 09:26 schrieb 孔嘉林 kongjiali...@gmail.com:


 Thanks Jonathan, actually I'm wondering how CQL is implemented underlying,
 a different RPC mechanism? Why it is faster than thrift? I know I'm wrong,
 but now I just regard CQL as a query language. Could you please help
 explain to me? I still feel puzzled after reading some docs about CQL. I
 create table in CQL, and use cql3 API in thrift. I don't know what else I
 can do with CQL. And I am using C++ to write the client side code.
 Currently I am not using the C++ driver and want to write some simple
 functionality by myself.

 Also, I didn't use the stress test tool provided in the Cassandra
 distribution because I also want to make sure whether I can achieve good
 performance as excepted using my client code. I know others have
 benchmarked Cassandra and got good results. But if I cannot reproduce the
 satisfactory results, I cannot use it in my case.

 I will create a repo and send a link later, hope to get your kind help.

 Thanks very much.

 2014-12-08 14:28 GMT+08:00 Jonathan Haddad j...@jonhaddad.com:

 I would really not recommend using thrift for anything at this point,
 including your load tests.  Take a look at CQL, all development is going
 there and has in 2.1 seen a massive performance boost over 2.0.

 You may want to try the Cassandra stress tool included in 2.1, it can
 stress a table you've already built.  That way you can rule out any bugs on
 the client side.  If you're going to keep using your tool, however, it
 would be helpful if you sent out a link to the repo, since currently we
 have no way of knowing if you've got a client side bug (data model or code)
 that's limiting your performance.


 On Sun Dec 07 2014 at 7:55:16 PM 孔嘉林 kongjiali...@gmail.com wrote:

 I find under the src/client folder of Cassandra 2.1.0 source code, there
 is a *RingCache.java* file. It uses a thrift client calling the*
 describe_ring()* API to get the token range of each Cassandra node. It
 is used on the client side. The client can use it combined with the
 partitioner to get the target node. In this way there is no need to route
 requests between Cassandra nodes, and the client can directly connect to
 the target node. So maybe it can save some routing time and improve
 performance.
 Thank you very much.

 2014-12-08 1:28 GMT+08:00 Jonathan Haddad j...@jonhaddad.com:

 What's a ring cache?

 FYI if you're using the DataStax CQL drivers they will automatically
 route requests to the correct node.

 On Sun Dec 07 2014 at 12:59:36 AM kong kongjiali...@gmail.com wrote:

 Hi,

 I'm doing stress test on Cassandra. And I learn that using ring cache
 can improve the performance because the client requests can directly go to
 the target Cassandra server and the coordinator Cassandra node is the
 desired target node. In this way, there is no need for coordinator node to
 route the client requests to the target node, and maybe we can get the
 linear performance increment.



 However, in my stress test on an Amazon EC2 cluster, the test results
 are weird. Seems that there's no performance improvement after using ring
 cache. Could anyone help me explain this results? (Also, I think the
 results of test without ring cache is weird, because there's no linear
 increment on QPS when new nodes are added. I need help on explaining this,
 too). The results are as follows:



 INSERT(write):

 Node count

 Replication factor

 QPS(No ring cache)

 QPS(ring cache)

 1

 1

 18687

 20195

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Jonathan Haddad

You don't need a prime number of nodes in your ring, but it's not a bad
idea to it be a multiple of your RF when your cluster is small.


On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder n...@whistle.com wrote:

 Hi Ian,

 Thanks for the suggestion but I had actually already done that prior to
 the scenario I described (to get myself some free space) and when I ran
 nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
 don't think that is where my space went.

 One additional piece of information I forgot to point out is that when I
 ran nodetool status on the node it included all 6 nodes.

 I have also heard it mentioned that I may want to have a prime number of
 nodes which may help protect against split-brain.  Is this true?  If so
 does it still apply when I am using vnodes?

 Thanks again,
 Nate

 --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose ianr...@fullstory.com wrote:

 Try `nodetool clearsnapshot` which will delete any snapshots you have.  I
 have never taken a snapshot with nodetool yet I found several snapshots on
 my disk recently (which can take a lot of space).  So perhaps they are
 automatically generated by some operation?  No idea.  Regardless, nuking
 those freed up a ton of space for me.

 - Ian


 On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder n...@whistle.com wrote:

 Hi All,

 I am new to Cassandra so I apologise in advance if I have missed
 anything obvious but this one currently has me stumped.

 I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
 C3.2XLarge nodes which overall is working very well for us.  However, after
 letting it run for a while I seem to get into a situation where the amount
 of disk space used far exceeds the total amount of data on each node and I
 haven't been able to get the size to go back down except by stopping and
 restarting the node.

 For example, in my data I have almost all of my data in one table.  On
 one of my nodes right now the total space used (as reported by nodetool
 cfstats) is 57.2 GB and there are no snapshots. However, when I look at the
 size of the data files (using du) the data file for that table is 107GB.
 Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly
 becomes a problem.

 Running nodetool compact didn't reduce the size and neither does running
 nodetool repair -pr on the node.  I also tried nodetool flush and nodetool
 cleanup (even though I have not added or removed any nodes recently) but it
 didn't change anything either.  In order to keep my cluster up I then
 stopped and started that node and the size of the data file dropped to 54GB
 while the total column family size (as reported by nodetool) stayed about
 the same.

 Any suggestions as to what I could be doing wrong?

 Thanks,
 Nate

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Jonathan Haddad

Well, I personally don't like RF=2.  It means if you're using CL=QUORUM and
a node goes down, you're going to have a bad time. (downtime) If you're
using CL=ONE then you'd be ok.  However, I am not wild about losing a node
and having only 1 copy of my data available in prod.

On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder n...@whistle.com wrote:

 Thanks Jonathan.  So there is nothing too idiotic about my current set-up
 with 6 boxes each with 256 vnodes each and a RF of 2?

 I appreciate the help,
 Nate



 --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad j...@jonhaddad.com wrote:

 You don't need a prime number of nodes in your ring, but it's not a bad
 idea to it be a multiple of your RF when your cluster is small.


 On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder n...@whistle.com wrote:

 Hi Ian,

 Thanks for the suggestion but I had actually already done that prior to
 the scenario I described (to get myself some free space) and when I ran
 nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
 don't think that is where my space went.

 One additional piece of information I forgot to point out is that when I
 ran nodetool status on the node it included all 6 nodes.

 I have also heard it mentioned that I may want to have a prime number of
 nodes which may help protect against split-brain.  Is this true?  If so
 does it still apply when I am using vnodes?

 Thanks again,
 Nate

 --
 *Nathanael Yoder*
 Principal Engineer  Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose ianr...@fullstory.com wrote:

 Try `nodetool clearsnapshot` which will delete any snapshots you have.
 I have never taken a snapshot with nodetool yet I found several snapshots
 on my disk recently (which can take a lot of space).  So perhaps they are
 automatically generated by some operation?  No idea.  Regardless, nuking
 those freed up a ton of space for me.

 - Ian


 On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder n...@whistle.com wrote:

 Hi All,

 I am new to Cassandra so I apologise in advance if I have missed
 anything obvious but this one currently has me stumped.

 I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
 C3.2XLarge nodes which overall is working very well for us.  However, 
 after
 letting it run for a while I seem to get into a situation where the amount
 of disk space used far exceeds the total amount of data on each node and I
 haven't been able to get the size to go back down except by stopping and
 restarting the node.

 For example, in my data I have almost all of my data in one table.  On
 one of my nodes right now the total space used (as reported by nodetool
 cfstats) is 57.2 GB and there are no snapshots. However, when I look at 
 the
 size of the data files (using du) the data file for that table is 107GB.
 Because the C3.2XLarge only have 160 GB of SSD you can see why this 
 quickly
 becomes a problem.

 Running nodetool compact didn't reduce the size and neither does
 running nodetool repair -pr on the node.  I also tried nodetool flush and
 nodetool cleanup (even though I have not added or removed any nodes
 recently) but it didn't change anything either.  In order to keep my
 cluster up I then stopped and started that node and the size of the data
 file dropped to 54GB while the total column family size (as reported by
 nodetool) stayed about the same.

 Any suggestions as to what I could be doing wrong?

 Thanks,
 Nate

Re: upgrade cassandra from 2.0.6 to 2.1.2

2014-12-09 Thread Jonathan Haddad

Yes.  It is, in general, a best practice to upgrade to the latest bug fix
release before doing an upgrade to the next point release.

On Tue Dec 09 2014 at 6:58:24 PM wyang wy...@v5.cn wrote:

 I looked some upgrade documentations and am a little puzzled.


 According to
 https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt, “Rolling
 upgrades from anything pre-2.0.7 is not supported”. It means we should
 upgrade to 2.0.7 or later first? Can we rolling upgrade to 2.0.7? Do we
 need upgrade stables after that?  It seems nothing specific to note
 upgrading between 2.0.6 and 2.0.7 in NEWS.txt


 Any advice will be kindly appreciated

Re: Cassandra Maintenance Best practices

2014-12-09 Thread Jonathan Haddad

I did a presentation on diagnosing performance problems in production at
the US  Euro summits, in which I covered quite a few tools  preventative
measures you should know when running a production cluster.  You may find
it useful:
http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/

On ops center - I recommend it.  It gives you a nice dashboard.  I don't
think it's completely comprehensive (but no tool really is) but it gets you
90% of the way there.

It's a good idea to run repairs, especially if you're doing deletes or
querying at CL=ONE.  I assume you're not using quorum, because on RF=2
that's the same as CL=ALL.

I recommend at least RF=3 because if you lose 1 server, you're on the edge
of data loss.


On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi nehajtriv...@gmail.com
wrote:

 Hi,
 We have Two Node Cluster Configuration in production with RF=2.

 Which means that the data is written in both the clusters and it's running
 for about a month now and has good amount of data.

 Questions?
 1. What are the best practices for maintenance?
 2. Is OPScenter required to be installed or I can manage with nodetool
 utility?
 3. Is is necessary to run repair weekly?

 thanks
 regards
 Neha

Re: batch_size_warn_threshold_in_kb

2014-12-12 Thread Jonathan Haddad

The really important thing to really take away from Ryan's original post is
that batches are not there for performance.  The only case I consider
batches to be useful for is when you absolutely need to know that several
tables all get a mutation (via logged batches).  The use case for this is
when you've got multiple tables that are serving as different views for
data.  It is absolutely not going to help you if you're trying to lump
queries together to reduce network  server overhead - in fact it'll do the
opposite.  If you're trying to do that, instead perform many async
queries.  The overhead of batches in cassandra is significant and you're
going to hit a lot of problems if you use them excessively (timeouts /
failures).

tl;dr: you probably don't want batch, you most likely want many async calls

On Thu Dec 11 2014 at 11:15:00 PM Mohammed Guller moham...@glassbeam.com
wrote:

  Ryan,

 Thanks for the quick response.



 I did see that jira before posting my question on this list. However, I
 didn’t see any information about why 5kb+ data will cause instability. 5kb
 or even 50kb seems too small. For example, if each mutation is 1000+ bytes,
 then with just 5 mutations, you will hit that threshold.



 In addition, Patrick is saying that he does not recommend more than 100
 mutations per batch. So why not warn users just on the # of mutations in a
 batch?



 Mohammed



 *From:* Ryan Svihla [mailto:rsvi...@datastax.com]
 *Sent:* Thursday, December 11, 2014 12:56 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: batch_size_warn_threshold_in_kb



 Nothing magic, just put in there based on experience. You can find the
 story behind the original recommendation here



 https://issues.apache.org/jira/browse/CASSANDRA-6487



 Key reasoning for the desire comes from Patrick McFadden:


 Yes that was in bytes. Just in my own experience, I don't recommend more
 than ~100 mutations per batch. Doing some quick math I came up with 5k as
 100 x 50 byte mutations.

 Totally up for debate.



 It's totally changeable, however, it's there in no small part because so
 many people confuse the BATCH keyword as a performance optimization, this
 helps flag those cases of misuse.



 On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller moham...@glassbeam.com
 wrote:

 Hi –

 The cassandra.yaml file has property called *batch_size_warn_threshold_in_kb.
 *

 The default size is 5kb and according to the comments in the yaml file, it
 is used to log WARN on any batch size exceeding this value in kilobytes. It
 says caution should be taken on increasing the size of this threshold as it
 can lead to node instability.



 Does anybody know the significance of this magic number 5kb? Why would a
 higher number (say 10kb) lead to node instability?



 Mohammed




 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect


 [image: twitter.png] https://twitter.com/foundev[image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/



 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: `nodetool cfhistogram` utility script

2014-12-12 Thread Jonathan Haddad

Hey Jens,

Unfortunately the output of the nodetool histograms changes between
versions.  While I think your script is useful, it's likely to break
between versions.  You might be interested to weigh in on the JIRA ticket
to make the nodetool output machine friendly:
https://issues.apache.org/jira/browse/CASSANDRA-5977


On Fri Dec 12 2014 at 5:48:51 AM Jens Rantil jens.ran...@tink.se wrote:

  Hi,

 I just quickly put together a tiny utility script to estimate
 average/mean/min/max/percentiles for `nodetool cfhistogram` latency output.
 Maybe could be useful to someone else, don’t know. You can find it here:

 https://gist.github.com/JensRantil/3da67e39f50aaf4f5bce

 Future improvements would obviously be to not hardcode `us:` and support
 the other histograms. Also, this logic should maybe even be moved into the
  `nodetool cfhistogram` since these are fairly common metrics for latency.

 Cheers,
 Jens

 ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad

There are cases where it can.  For instance, if you batch multiple
mutations to the same partition (and talk to a replica for that partition)
they can reduce network overhead because they're effectively a single
mutation in the eye of the cluster.  However, if you're not doing that (and
most people aren't!) you end up putting additional pressure on the
coordinator because now it has to talk to several other servers.  If you
have 100 servers, and perform a mutation on 100 partitions, you could have
a coordinator that's

1) talking to every machine in the cluster and
b) waiting on a response from a significant portion of them

before it can respond success or fail.  Any delay, from GC to a bad disk,
can affect the performance of the entire batch.

On Sat Dec 13 2014 at 4:17:33 AM Jack Krupansky j...@basetechnology.com
wrote:

   Jonathan and Ryan,

 Jonathan says “It is absolutely not going to help you if you're trying to
 lump queries together to reduce network  server overhead - in fact it'll
 do the opposite”, but I would note that the CQL3 spec says “The BATCH 
 statement
 ... serves several purposes: 1. It saves network round-trips between the
 client and the server (and sometimes between the server coordinator and the
 replicas) when batching multiple updates.” Is the spec inaccurate? I mean,
 it seems in conflict with your statement.

 See:
 https://cassandra.apache.org/doc/cql3/CQL.html

 I see the spec as gospel – if it’s not accurate, let’s propose a change to
 make it accurate.

 The DataStax CQL doc is more nuanced: “Batching multiple statements can
 save network exchanges between the client/server and server
 coordinator/replicas. However, because of the distributed nature of
 Cassandra, spread requests across nearby nodes as much as possible to
 optimize performance. Using batches to optimize performance is usually not
 successful, as described in Using and misusing batches section. For
 information about the fastest way to load data, see Cassandra: Batch
 loading without the Batch keyword.”

 Maybe what we really need is a “client/driver-side batch”, which is simply
 a way to collect “batches” of operations in the client/driver and then let
 the driver determine what degree of batching and asynchronous operation is
 appropriate.

 It might also be nice to have an inquiry for the cluster as to what batch
 size is most optimal for the cluster, like number of mutations in a batch
 and number of simultaneous connections, and to have that be dynamic based
 on overall cluster load.

 I would also note that the example in the spec has multiple inserts with
 different partition key values, which flies in the face of the admonition
 to to refrain from using server-side distribution of requests.

 At a minimum the CQL spec should make a more clear statement of intent and
 non-intent for BATCH.

 -- Jack Krupansky

  *From:* Jonathan Haddad j...@jonhaddad.com
 *Sent:* Friday, December 12, 2014 12:58 PM
 *To:* user@cassandra.apache.org ; Ryan Svihla rsvi...@datastax.com
 *Subject:* Re: batch_size_warn_threshold_in_kb

 The really important thing to really take away from Ryan's original post
 is that batches are not there for performance.  The only case I consider
 batches to be useful for is when you absolutely need to know that several
 tables all get a mutation (via logged batches).  The use case for this is
 when you've got multiple tables that are serving as different views for
 data.  It is absolutely not going to help you if you're trying to lump
 queries together to reduce network  server overhead - in fact it'll do the
 opposite.  If you're trying to do that, instead perform many async
 queries.  The overhead of batches in cassandra is significant and you're
 going to hit a lot of problems if you use them excessively (timeouts /
 failures).

 tl;dr: you probably don't want batch, you most likely want many async calls

 On Thu Dec 11 2014 at 11:15:00 PM Mohammed Guller moham...@glassbeam.com
 wrote:

  Ryan,

 Thanks for the quick response.



 I did see that jira before posting my question on this list. However, I
 didn’t see any information about why 5kb+ data will cause instability. 5kb
 or even 50kb seems too small. For example, if each mutation is 1000+ bytes,
 then with just 5 mutations, you will hit that threshold.



 In addition, Patrick is saying that he does not recommend more than 100
 mutations per batch. So why not warn users just on the # of mutations in a
 batch?



 Mohammed



 *From:* Ryan Svihla [mailto:rsvi...@datastax.com]
 *Sent:* Thursday, December 11, 2014 12:56 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: batch_size_warn_threshold_in_kb



 Nothing magic, just put in there based on experience. You can find the
 story behind the original recommendation here



 https://issues.apache.org/jira/browse/CASSANDRA-6487



 Key reasoning for the desire comes from Patrick McFadden:


 Yes that was in bytes. Just in my own experience, I don't recommend more
 than ~100

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad

To add to Ryan's (extremely valid!) point, your test works because the
coordinator is always a replica.  Try again using 20 (or 50) nodes.
Batching works great at RF=N=3 because it always gets to write to local and
talk to exactly 2 other servers on every request.  Consider what happens
when the coordinator needs to talk to 100 servers.  It's unnecessary
overhead on the server side.

To save network overhead, Cassandra 2.1 added support for response grouping
(see http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster)
which massively helps performance.  It provides the benefit of batches but
without the coordinator overhead.

Can you post your benchmark code?

On Sat Dec 13 2014 at 6:10:36 AM Jonathan Haddad j...@jonhaddad.com wrote:

 There are cases where it can.  For instance, if you batch multiple
 mutations to the same partition (and talk to a replica for that partition)
 they can reduce network overhead because they're effectively a single
 mutation in the eye of the cluster.  However, if you're not doing that (and
 most people aren't!) you end up putting additional pressure on the
 coordinator because now it has to talk to several other servers.  If you
 have 100 servers, and perform a mutation on 100 partitions, you could have
 a coordinator that's

 1) talking to every machine in the cluster and
 b) waiting on a response from a significant portion of them

 before it can respond success or fail.  Any delay, from GC to a bad disk,
 can affect the performance of the entire batch.


 On Sat Dec 13 2014 at 4:17:33 AM Jack Krupansky j...@basetechnology.com
 wrote:

   Jonathan and Ryan,

 Jonathan says “It is absolutely not going to help you if you're trying to
 lump queries together to reduce network  server overhead - in fact it'll
 do the opposite”, but I would note that the CQL3 spec says “The BATCH 
 statement
 ... serves several purposes: 1. It saves network round-trips between the
 client and the server (and sometimes between the server coordinator and the
 replicas) when batching multiple updates.” Is the spec inaccurate? I mean,
 it seems in conflict with your statement.

 See:
 https://cassandra.apache.org/doc/cql3/CQL.html

 I see the spec as gospel – if it’s not accurate, let’s propose a change
 to make it accurate.

 The DataStax CQL doc is more nuanced: “Batching multiple statements can
 save network exchanges between the client/server and server
 coordinator/replicas. However, because of the distributed nature of
 Cassandra, spread requests across nearby nodes as much as possible to
 optimize performance. Using batches to optimize performance is usually not
 successful, as described in Using and misusing batches section. For
 information about the fastest way to load data, see Cassandra: Batch
 loading without the Batch keyword.”

 Maybe what we really need is a “client/driver-side batch”, which is
 simply a way to collect “batches” of operations in the client/driver and
 then let the driver determine what degree of batching and asynchronous
 operation is appropriate.

 It might also be nice to have an inquiry for the cluster as to what batch
 size is most optimal for the cluster, like number of mutations in a batch
 and number of simultaneous connections, and to have that be dynamic based
 on overall cluster load.

 I would also note that the example in the spec has multiple inserts with
 different partition key values, which flies in the face of the admonition
 to to refrain from using server-side distribution of requests.

 At a minimum the CQL spec should make a more clear statement of intent
 and non-intent for BATCH.

 -- Jack Krupansky

  *From:* Jonathan Haddad j...@jonhaddad.com
 *Sent:* Friday, December 12, 2014 12:58 PM
 *To:* user@cassandra.apache.org ; Ryan Svihla rsvi...@datastax.com
 *Subject:* Re: batch_size_warn_threshold_in_kb

 The really important thing to really take away from Ryan's original post
 is that batches are not there for performance.  The only case I consider
 batches to be useful for is when you absolutely need to know that several
 tables all get a mutation (via logged batches).  The use case for this is
 when you've got multiple tables that are serving as different views for
 data.  It is absolutely not going to help you if you're trying to lump
 queries together to reduce network  server overhead - in fact it'll do the
 opposite.  If you're trying to do that, instead perform many async
 queries.  The overhead of batches in cassandra is significant and you're
 going to hit a lot of problems if you use them excessively (timeouts /
 failures).

 tl;dr: you probably don't want batch, you most likely want many async
 calls

 On Thu Dec 11 2014 at 11:15:00 PM Mohammed Guller moham...@glassbeam.com
 wrote:

  Ryan,

 Thanks for the quick response.



 I did see that jira before posting my question on this list. However, I
 didn’t see any information about why 5kb+ data will cause instability. 5kb
 or even 50kb seems too small. For example, if each

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad

One thing to keep in mind is the overhead of a batch goes up as the number
of servers increases. Talking to 3 is going to have a much different
performance profile than talking to 20. Keep in mind that the coordinator
is going to be talking to every server in the cluster with a big batch.
The amount of local writes will decrease as it owns a smaller portion of
the ring. All you've done is add an extra network hop between your client
and where the data should actually be. You also start to have an impact on
GC in a very negative way.

Your point is valid about topology changes, but that's a relatively rare
occurrence, and the driver is notified pretty quickly, so I wouldn't
optimize for that case.

Can you post your test code in a gist or something? I can't really talk
about your benchmark without seeing it and you're basing your stance on the
premise that it is correct, which it may not be.

On Sat Dec 13 2014 at 8:45:21 AM Eric Stevens migh...@gmail.com wrote:

You can seen what the partition key strategies are for each of the tables,
test5 shows the least improvement. The set (aid, end) should be unique,
and bckt is derived from end. Some of these layouts result in clustering
on the same partition keys, that's actually tunable with the ~15 per
bucket reported (exact number of entries per bucket will vary but should
have a mean of 15 in that run - it's an input parameter to my tests).
test5 obviously ends up being exclusively unique partitions for each
record.

Your points about:
1) Failed batches having a higher cost than failed single statements
2) In my test, every node was a replica for all data.

These are both very good points.

For #1, since the worst case scenario is nearly twice fast in batches as
its single statement equivalent, in terms of impact on the client, you'd
have to be retrying half your batches before you broke even there (but of
course those retries are not free to the cluster, so you probably make the
performance tipping point approach a lot faster). This alone may be cause
to justify avoiding batches, or at least severely limiting their size (hey,
that's what this discussion is about!).

For #2, that's certainly a good point, for this test cluster, I should at
least re-run with RF=1 so that proxying times start to matter. If you're
not using a token aware client or not using a token aware policy for
whatever reason, this should even out though, no? Each node will end up
coordinating 1/(nodecount-rf+1) mutations, regardless of whether they are
batched or single statements. The DS driver is very careful to caution
that the topology map it maintains makes no guarantees on freshness, so you
may see a significant performance penalty in your client when the topology
changes if you're depending on token aware routing as part of your
performance requirements.

I'm curious what your thoughts are on grouping statements by primary
replica according to the routing policy, and executing unlogged batches
that way (so that for token aware routing, all statements are executed on a
replica, for others it'd make no difference). Retries are still more
expensive, but token aware proxying avoidance is still had. It's pretty
easy to do in Scala:

def groupByFirstReplica(statements: Iterable[Statement])(implicit
session: Session): Map[Host, Seq[Statement]] = {
val meta = session.getCluster.getMetadata
statements.groupBy { st =
meta.getReplicas(st.getKeyspace, st.getRoutingKey).iterator().next
}
}
val result =
Future.traverse(groupByFirstReplica(statements).values).map(st =
newBatch(st).executeAsync())

Let me get together my test code, it depends on some existing utilities we
use elsewhere, such as implicit conversions between Google and Scala native
futures. I'll try to put this together in a format that's runnable for you
in a Scala REPL console without having to resolve our internal
dependencies. This may not be today though.

Also, @Ryan, I don't think that shuffling would make a difference for my
above tests since as Jon observed, all my nodes were already replicas there.

On Sat, Dec 13, 2014 at 7:37 AM, Ryan Svihla rsvi...@datastax.com wrote:

Also..what happens when you turn on shuffle with token aware?
http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/policies/TokenAwarePolicy.html

On Sat, Dec 13, 2014 at 8:21 AM, Jonathan Haddad j...@jonhaddad.com
wrote:

To add to Ryan's (extremely valid!) point, your test works because the
coordinator is always a replica. Try again using 20 (or 50) nodes.
Batching works great at RF=N=3 because it always gets to write to local and
talk to exactly 2 other servers on every request. Consider what happens
when the coordinator needs to talk to 100 servers. It's unnecessary
overhead on the server side.

To save network overhead, Cassandra 2.1 added support for response
grouping (see
http://www.datastax.com/dev/blog/cassandra-2-1-now

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad

On Sat Dec 13 2014 at 10:00:16 AM Eric Stevens migh...@gmail.com wrote:

 Isn't the net effect of coordination overhead incurred by batches
 basically the same as the overhead incurred by RoundRobin or other
 non-token-aware request routing?  As the cluster size increases, each node
 would coordinate the same percentage of writes in batches under token
 awareness as they would under a more naive single statement routing
 strategy.  If write volume per time unit is the same in both approaches,
 each node ends up coordinating the majority of writes under either strategy
 as the cluster grows.


If you're not token aware, there's extra coordinator overhead, yes.  If you
are token aware, not the case.  I'm operating under the assumption that
you'd want to be token aware, since I don't see a point in not doing so :)

Unfortunately my Scala isn't the best so I'm going to have to take a little
bit to wade through the code.

It may be useful to run cassandra-stress (it doesn't seem to have a mode
for batches) to get a baseline on non-batches.  I'm curious to know if you
get different numbers than the scala profiler.




 GC pressure in the cluster is a concern of course, as you observe.  But
 delta performance is *substantial* from what I can see.  As in the case
 where you're bumping up against retries, this will cause you to fall over
 much more rapidly as you approach your tipping point, but in a healthy
 cluster, it's the same write volume, just a longer tenancy in eden.  If
 reasonable sized batches are causing survivors, you're not far off from
 falling over anyway.

 On Sat, Dec 13, 2014 at 10:04 AM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 One thing to keep in mind is the overhead of a batch goes up as the
 number of servers increases.  Talking to 3 is going to have a much
 different performance profile than talking to 20.  Keep in mind that the
 coordinator is going to be talking to every server in the cluster with a
 big batch.  The amount of local writes will decrease as it owns a smaller
 portion of the ring.  All you've done is add an extra network hop between
 your client and where the data should actually be.  You also start to have
 an impact on GC in a very negative way.

 Your point is valid about topology changes, but that's a relatively rare
 occurrence, and the driver is notified pretty quickly, so I wouldn't
 optimize for that case.

 Can you post your test code in a gist or something?  I can't really talk
 about your benchmark without seeing it and you're basing your stance on the
 premise that it is correct, which it may not be.



 On Sat Dec 13 2014 at 8:45:21 AM Eric Stevens migh...@gmail.com wrote:

 You can seen what the partition key strategies are for each of the
 tables, test5 shows the least improvement.  The set (aid, end) should be
 unique, and bckt is derived from end.  Some of these layouts result in
 clustering on the same partition keys, that's actually tunable with the
 ~15 per bucket reported (exact number of entries per bucket will vary but
 should have a mean of 15 in that run - it's an input parameter to my
 tests).  test5 obviously ends up being exclusively unique partitions for
 each record.

 Your points about:
 1) Failed batches having a higher cost than failed single statements
 2) In my test, every node was a replica for all data.

 These are both very good points.

 For #1, since the worst case scenario is nearly twice fast in batches as
 its single statement equivalent, in terms of impact on the client, you'd
 have to be retrying half your batches before you broke even there (but of
 course those retries are not free to the cluster, so you probably make the
 performance tipping point approach a lot faster).  This alone may be cause
 to justify avoiding batches, or at least severely limiting their size (hey,
 that's what this discussion is about!).

 For #2, that's certainly a good point, for this test cluster, I should
 at least re-run with RF=1 so that proxying times start to matter.  If
 you're not using a token aware client or not using a token aware policy for
 whatever reason, this should even out though, no?  Each node will end up
 coordinating 1/(nodecount-rf+1) mutations, regardless of whether they are
 batched or single statements.  The DS driver is very careful to caution
 that the topology map it maintains makes no guarantees on freshness, so you
 may see a significant performance penalty in your client when the topology
 changes if you're depending on token aware routing as part of your
 performance requirements.


 I'm curious what your thoughts are on grouping statements by primary
 replica according to the routing policy, and executing unlogged batches
 that way (so that for token aware routing, all statements are executed on a
 replica, for others it'd make no difference).  Retries are still more
 expensive, but token aware proxying avoidance is still had.  It's pretty
 easy to do in Scala:

   def groupByFirstReplica(statements: Iterable

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad

Not a problem - it's good to hash this stuff out and understand the
technical reasons why something works or doesn't work.


On Sat Dec 13 2014 at 10:07:10 AM Jonathan Haddad j...@jonhaddad.com wrote:

 On Sat Dec 13 2014 at 10:00:16 AM Eric Stevens migh...@gmail.com wrote:

 Isn't the net effect of coordination overhead incurred by batches
 basically the same as the overhead incurred by RoundRobin or other
 non-token-aware request routing?  As the cluster size increases, each node
 would coordinate the same percentage of writes in batches under token
 awareness as they would under a more naive single statement routing
 strategy.  If write volume per time unit is the same in both approaches,
 each node ends up coordinating the majority of writes under either strategy
 as the cluster grows.


 If you're not token aware, there's extra coordinator overhead, yes.  If
 you are token aware, not the case.  I'm operating under the assumption that
 you'd want to be token aware, since I don't see a point in not doing so :)

 Unfortunately my Scala isn't the best so I'm going to have to take a
 little bit to wade through the code.

 It may be useful to run cassandra-stress (it doesn't seem to have a mode
 for batches) to get a baseline on non-batches.  I'm curious to know if you
 get different numbers than the scala profiler.




 GC pressure in the cluster is a concern of course, as you observe.  But
 delta performance is *substantial* from what I can see.  As in the case
 where you're bumping up against retries, this will cause you to fall over
 much more rapidly as you approach your tipping point, but in a healthy
 cluster, it's the same write volume, just a longer tenancy in eden.  If
 reasonable sized batches are causing survivors, you're not far off from
 falling over anyway.

 On Sat, Dec 13, 2014 at 10:04 AM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 One thing to keep in mind is the overhead of a batch goes up as the
 number of servers increases.  Talking to 3 is going to have a much
 different performance profile than talking to 20.  Keep in mind that the
 coordinator is going to be talking to every server in the cluster with a
 big batch.  The amount of local writes will decrease as it owns a smaller
 portion of the ring.  All you've done is add an extra network hop between
 your client and where the data should actually be.  You also start to have
 an impact on GC in a very negative way.

 Your point is valid about topology changes, but that's a relatively rare
 occurrence, and the driver is notified pretty quickly, so I wouldn't
 optimize for that case.

 Can you post your test code in a gist or something?  I can't really talk
 about your benchmark without seeing it and you're basing your stance on the
 premise that it is correct, which it may not be.



 On Sat Dec 13 2014 at 8:45:21 AM Eric Stevens migh...@gmail.com wrote:

 You can seen what the partition key strategies are for each of the
 tables, test5 shows the least improvement.  The set (aid, end) should be
 unique, and bckt is derived from end.  Some of these layouts result in
 clustering on the same partition keys, that's actually tunable with the
 ~15 per bucket reported (exact number of entries per bucket will vary but
 should have a mean of 15 in that run - it's an input parameter to my
 tests).  test5 obviously ends up being exclusively unique partitions for
 each record.

 Your points about:
 1) Failed batches having a higher cost than failed single statements
 2) In my test, every node was a replica for all data.

 These are both very good points.

 For #1, since the worst case scenario is nearly twice fast in batches
 as its single statement equivalent, in terms of impact on the client, you'd
 have to be retrying half your batches before you broke even there (but of
 course those retries are not free to the cluster, so you probably make the
 performance tipping point approach a lot faster).  This alone may be cause
 to justify avoiding batches, or at least severely limiting their size (hey,
 that's what this discussion is about!).

 For #2, that's certainly a good point, for this test cluster, I should
 at least re-run with RF=1 so that proxying times start to matter.  If
 you're not using a token aware client or not using a token aware policy for
 whatever reason, this should even out though, no?  Each node will end up
 coordinating 1/(nodecount-rf+1) mutations, regardless of whether they are
 batched or single statements.  The DS driver is very careful to caution
 that the topology map it maintains makes no guarantees on freshness, so you
 may see a significant performance penalty in your client when the topology
 changes if you're depending on token aware routing as part of your
 performance requirements.


 I'm curious what your thoughts are on grouping statements by primary
 replica according to the routing policy, and executing unlogged batches
 that way (so that for token aware routing, all statements are executed on a
 replica

Re: batch_size_warn_threshold_in_kb

2014-12-15 Thread Jonathan Haddad

 order=
 33,686,013,000

  Execution Results for 1 runs of 113825 records =
 1 runs of 113,825 records (3 protos, 5 agents, ~15 per bucket) in batches
 of 10
 Total Run Time
 traverse test3 ((aid, bckt), end, proto) reverse order=
 11,030,788,000
 traverse test1 ((aid, bckt), proto, end) reverse order=
 13,345,962,000
 traverse test2 ((aid, bckt), end) =
 15,110,208,000
 traverse test4 ((aid, bckt), proto, end) no explicit ordering =
 16,398,982,000
 traverse test5 ((aid, bckt, end)) =
 22,166,119,000

 For giggles I added token aware batching (grouping statements within a
 single batch by meta.getReplicas(statement.getKeyspace,
 statement.getRoutingKey).iterator().next - see https://gist.github.com/
 MightyE/1c98912fca104f6138fc#file-testsuite-L176-L189 ), here's that run;
 comparable results with before, and easily inside one sigma of
 non-token-aware batching, so not a statistically significant difference.

  Execution Results for 1 runs of 113825 records =
 1 runs of 113,825 records (3 protos, 5 agents, ~15 per bucket) in batches
 of 10
 Total Run Time
 traverse test2 ((aid, bckt), end) =
 11,429,008,000
 traverse test1 ((aid, bckt), proto, end) reverse order=
 12,593,034,000
 traverse test4 ((aid, bckt), proto, end) no explicit ordering =
 13,111,244,000
 traverse test3 ((aid, bckt), end, proto) reverse order=
 25,163,064,000
 traverse test5 ((aid, bckt, end)) =
 30,233,744,000



 On Sat, Dec 13, 2014 at 11:07 AM, Jonathan Haddad j...@jonhaddad.com
 wrote:



 On Sat Dec 13 2014 at 10:00:16 AM Eric Stevens migh...@gmail.com wrote:

 Isn't the net effect of coordination overhead incurred by batches
 basically the same as the overhead incurred by RoundRobin or other
 non-token-aware request routing?  As the cluster size increases, each node
 would coordinate the same percentage of writes in batches under token
 awareness as they would under a more naive single statement routing
 strategy.  If write volume per time unit is the same in both approaches,
 each node ends up coordinating the majority of writes under either strategy
 as the cluster grows.


 If you're not token aware, there's extra coordinator overhead, yes.  If
 you are token aware, not the case.  I'm operating under the assumption that
 you'd want to be token aware, since I don't see a point in not doing so :)

 Unfortunately my Scala isn't the best so I'm going to have to take a
 little bit to wade through the code.

 It may be useful to run cassandra-stress (it doesn't seem to have a mode
 for batches) to get a baseline on non-batches.  I'm curious to know if you
 get different numbers than the scala profiler.




 GC pressure in the cluster is a concern of course, as you observe.  But
 delta performance is *substantial* from what I can see.  As in the case
 where you're bumping up against retries, this will cause you to fall over
 much more rapidly as you approach your tipping point, but in a healthy
 cluster, it's the same write volume, just a longer tenancy in eden.  If
 reasonable sized batches are causing survivors, you're not far off from
 falling over anyway.

 On Sat, Dec 13, 2014 at 10:04 AM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 One thing to keep in mind is the overhead of a batch goes up as the
 number of servers increases.  Talking to 3 is going to have a much
 different performance profile than talking to 20.  Keep in mind that the
 coordinator is going to be talking to every server in the cluster with a
 big batch.  The amount of local writes will decrease as it owns a smaller
 portion of the ring.  All you've done is add an extra network hop between
 your client and where the data should actually be.  You also start to have
 an impact on GC in a very negative way.

 Your point is valid about topology changes, but that's a relatively
 rare occurrence, and the driver is notified pretty quickly, so I wouldn't
 optimize for that case.

 Can you post your test code in a gist or something?  I can't really
 talk about your benchmark without seeing it and you're basing your stance
 on the premise that it is correct, which it may not be.



 On Sat Dec 13 2014 at 8:45:21 AM Eric Stevens migh...@gmail.com
 wrote:

 You can seen what the partition key strategies are for each of the
 tables, test5 shows the least improvement.  The set (aid, end) should be
 unique, and bckt is derived from end.  Some of these layouts result in
 clustering on the same partition keys, that's actually tunable with the
 ~15 per bucket reported (exact number of entries per bucket will vary 
 but
 should have a mean of 15 in that run - it's an input parameter to my
 tests).  test5 obviously ends up being exclusively unique partitions for
 each record.

 Your points about:
 1) Failed batches having a higher cost than failed single statements
 2) In my test, every node

Re: bootstrapping manually when auto_bootstrap=false ?

2014-12-18 Thread Jonathan Haddad

I'd consider solving your root problem of people are starting and stopping
servers in prod accidentally instead of making Cassandra more difficult to
manage operationally.

On Thu Dec 18 2014 at 4:04:34 AM Ryan Svihla rsvi...@datastax.com wrote:

why auto_bootstrap=false? The documentation even suggests the opposite. If
you don't auto_bootstrap the node will take queries before it has copies of
all the data, and you'll get the wrong answer (it'd not be unlike using CL
ONE when you've got a bunch of dropped mutations on a single node in the
cluster).

On Wed, Dec 17, 2014 at 10:45 PM, Ben Bromhead b...@instaclustr.com
wrote:

- In Cassandra yaml set auto_bootstrap = false
- Boot node
- nodetool rebuild

Very similar to
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

On 18 December 2014 at 14:04, Kevin Burton bur...@spinn3r.com wrote:

I’m trying to figure out the best way to bootstrap our nodes.

I *think* I want our nodes to be manually bootstrapped. This way an
admin has to explicitly bring up the node in the cluster and I don’t have
to worry about a script accidentally provisioning new nodes.

The problem is HOW do you do it?

I couldn’t find any reference anywhere in the documentation.

I *think* I run nodetool repair? but it’s unclear..

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359

[image: datastax_logo.png] http://www.datastax.com/

Ryan Svihla

Solution Architect

[image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
http://www.linkedin.com/pub/ryan-svihla/12/621/727/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: full gc too oftenvAquin p y l mmm am m

2014-12-18 Thread Jonathan Haddad

This topic comes up quite a bit.  Enough, in fact, that I've done a 1 hour
webinar on the topic.  I cover how the JVM GC works and things you need to
consider when tuning it for Cassandra.

https://www.youtube.com/watch?v=7B_w6YDYSwA

With your specific problem - full GC not reducing the old gen - the most
obvious answer is there's not much garbage to collect.  Take a look at
nodetool tpstats.  Do you see lots of blocked MemtableFlushWriters?

Jon


On Thu Dec 18 2014 at 2:01:00 PM Y.Wong yungmw...@gmail.com wrote:

 V
 On Dec 4, 2014 11:14 PM, Philo Yang ud1...@gmail.com wrote:

 Hi,all

 I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with full
 gc that sometime there may be one or two nodes full gc more than one time
 per minute and over 10 seconds each time, then the node will be unreachable
 and the latency of cluster will be increased.

 I grep the GCInspector's log, I found when the node is running fine
 without gc trouble there are two kinds of gc:
 ParNew GC in less than 300ms which clear the Par Eden Space and
 enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in
 more than 200ms, there is only a small number of ParNew GC in log)
 ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and
 enlarge Par Eden Space little, each 1-2 hours it will be executed once.

 However, sometimes ConcurrentMarkSweep will be strange like it shows:

 INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 - 3579838464;
 Par Eden Space: 503316480 - 294794576; Par Survivor Space: 62914528 - 0
 INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 - 3579836512;
 Par Eden Space: 503316480 - 310562032; Par Survivor Space: 62872496 - 0
 INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 - 3579805792;
 Par Eden Space: 503316480 - 332391096; Par Survivor Space: 62914544 - 0
 INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 - 3579829760;
 Par Eden Space: 503316480 - 351991456; Par Survivor Space: 62914552 - 0
 INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 - 3579799752;
 Par Eden Space: 503316480 - 366222584; Par Survivor Space: 62914560 - 0
 INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 - 3579817392;
 Par Eden Space: 503316480 - 388702928; Par Survivor Space: 62914552 - 0
 INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 - 3579838424;
 Par Eden Space: 503316480 - 408992784; Par Survivor Space: 62896720 - 0
 INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 - 3579816424;
 Par Eden Space: 503316480 - 438633608; Par Survivor Space: 62914544 - 0
 INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 - 3579785496;
 Par Eden Space: 503316480 - 441354856; Par Survivor Space: 62889528 - 0
 INFO  [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12082ms.  CMS Old Gen: 3579786592 - 3579814464;
 Par Eden Space: 503316480 - 448782440; Par Survivor Space: 62914560 - 0

 In each time Old Gen reduce only a little, Survivor Space will be clear
 but the heap is still full so there will be another full gc very soon then
 the node will down. If I restart the node, it will be fine without gc
 trouble.

 Can anyone help me to find out where is the problem that full gc can't
 reduce CMS Old Gen? Is it because there are too many objects in heap can't
 be recycled? I think review the table scheme designing and add new nodes
 into cluster is a good idea, but I still want to know if there is any other
 reason causing this trouble.

 Thanks,
 Philo Yang

Re: simple data movement ?

2014-12-19 Thread Jonathan Haddad

It may be more valuable to set up your test cluster as the same version,
and make sure your tokens are the same. then copy over your sstables.
you'll have an exact replica of prod you can test your upgrade process.

On Fri Dec 19 2014 at 11:04:58 AM Ryan Svihla rsvi...@datastax.com wrote:

In theory, you could always do a data dump ..sstable to json and back for
example, but you'd have to have your schema setup ,and I've not actually
done this myself so YMMV.

I've helped a bunch of folks with that upgrade path and while it's time
consuming it does work.

On Fri, Dec 19, 2014 at 8:49 AM, Langston, Jim jim.langs...@dynatrace.com
wrote:

Thanks, this looks uglier , I double checked my production cluster ( I
have a staging and development cluster as well ) and
production is on 1.2.8. A copy of the data resulted in a mssage :

Exception encountered during startup: Incompatible SSTable found.
Current version ka is unable to read file:
/cassandra/apache-cassandra-2.1.2/bin/../data/data/system/schema_keyspaces/system-schema_keyspaces-ic-150.
Please run upgradesstables.

Is the move going to to be 1.2.8 -- 1.2.9 -- 2.0.x -- 2.1.2 ??

Can I just dump the data and import it into 2.1.2 ??

Jim

From: Ryan Svihla rsvi...@datastax.com
Reply-To: user@cassandra.apache.org
Date: Thu, 18 Dec 2014 06:00:09 -0600
To: user@cassandra.apache.org
Subject: Re: simple data movement ?

I'm not sure that'll work with that many version moves in the middle,
upgrades are to my knowledge only tested between specific steps, namely
from 1.2.9 to the latest 2.0.x

http://www.datastax.com/documentation/upgrade/doc/upgrade/cassandra/upgradeC_c.html
Specifically:

Cassandra 2.0.x restrictions¶
http://www.datastax.com/documentation/upgrade/doc/upgrade/cassandra/upgradeC_c.html?scroll=concept_ds_yqj_5xr_ck__section_ubt_nwr_54

After downloading DataStax Community
http://planetcassandra.org/cassandra/, upgrade to Cassandra directly
from Cassandra 1.2.9 or later. Cassandra 2.0 is not network- or
SSTable-compatible with versions older than 1.2.9. If your version of
Cassandra is earlier than 1.2.9 and you want to perform a rolling restart
http://www.datastax.com/documentation/cassandra/1.2/cassandra/glossary/gloss_rolling_restart.html,
first upgrade the entire cluster to 1.2.9, and then to Cassandra 2.0.
Cassandra 2.1.x restrictions¶
http://www.datastax.com/documentation/upgrade/doc/upgrade/cassandra/upgradeC_c.html?scroll=concept_ds_yqj_5xr_ck__section_qzx_pwr_54

Upgrade to Cassandra 2.1 from Cassandra 2.0.7 or later.

Cassandra 2.1 is not compatible with Cassandra 1.x SSTables. First
upgrade the nodes to Cassandra 2.0.7 or later, start the cluster, upgrade
the SSTables, stop the cluster, and then upgrade to Cassandra 2.1.

On Wed, Dec 17, 2014 at 10:55 PM, Ben Bromhead b...@instaclustr.com
wrote:

Just copy the data directory from each prod node to your test node (and
relevant configuration files etc).

If your IP addresses are different between test and prod, follow
https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/

On 18 December 2014 at 09:10, Langston, Jim jim.langs...@dynatrace.com
wrote:

Hi all,

I have set up a test environment with C* 2.1.2, wanting to test our
applications against it. I currently have C* 1.2.9 in production and
want
to use that data for testing. What would be a good approach for simply
taking a copy of the production data and moving it into the test env and
having the test env C* use that data ?

The test env. is identical is size, with the difference being the
versions
of C*.

Thanks,

Jim
The contents of this e-mail are intended for the named addressee only.
It contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or
disclose it to anyone else. If you received it in error please notify us
immediately and then destroy it

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359

[image: datastax_logo.png] http://www.datastax.com/

Ryan Svihla

Solution Architect

[image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
http://www.linkedin.com/pub/ryan-svihla/12/621/727/

The contents of this e-mail are intended for the named addressee
only. It contains information that may be confidential. Unless you are the
named addressee or an authorized designee, you may not

1 2 3 4 5 >

1 - 100 of 498 matches

Mail list logo