Re: schema change management tools

2012-10-04 Thread Jonathan Haddad
Not that I know of. I've always been really strict about dumping my schemas (to start) and keeping my changes in migration files. I don't do a ton of schema changes so I haven't had a need to really automate it. Even with MySQL I never bothered. Jon On Thu, Oct 4, 2012 at 6:27 PM, John Sanda

Re: schema change management tools

2012-10-04 Thread Jonathan Haddad
if there is already something out there. If not though, I will be sure to post back to the list with whatever I wind up doing. On Thu, Oct 4, 2012 at 9:34 PM, Jonathan Haddad j...@jonhaddad.com wrote: Not that I know of. I've always been really strict about dumping my schemas (to start) and keeping

Re: Migrating data from 2 node cluster to a 3 node cluster

2013-07-04 Thread Jonathan Haddad
You should run a nodetool repair after you copy the data over. You could also use the sstable loader, which would stream the data to the proper node. On Thu, Jul 4, 2013 at 10:03 AM, srmore comom...@gmail.com wrote: We are planning to move data from a 2 node cluster to a 3 node cluster. We

Re: too many open files

2013-07-14 Thread Jonathan Haddad
Are you using leveled compaction? If so, what do you have the file size set at? If you're using the defaults, you'll have a ton of really small files. I believe Albert Tobey recommended using 256MB for the table sstable_size_in_mb to avoid this problem. On Sun, Jul 14, 2013 at 5:10 PM, Paul

Re: CPU Bound Writes

2013-07-20 Thread Jonathan Haddad
Everything is written to the commit log. In the case of a crash, cassandra recovers by replaying the log. On Sat, Jul 20, 2013 at 9:03 AM, Mohammad Hajjat haj...@purdue.edu wrote: Patricia, Thanks for the info. So are you saying that the *whole* data is being written on disk in the commit

Re: VM dimensions for running Cassandra and Hadoop

2013-07-31 Thread Jonathan Haddad
Having just enough RAM to hold the JVM's heap generally isn't a good idea unless you're not planning on doing much with the machine. Any memory not allocated to a process will generally be put to good use serving as page cache. See here: http://en.wikipedia.org/wiki/Page_cache Jon On Tue, Jul

Re: CQL and undefined columns

2013-07-31 Thread Jonathan Haddad
It's advised you do not use compact storage, as it's primarily for backwards compatibility. The first of these option is COMPACT STORAGE. This option is meanly targeted towards backward compatibility with some table definition created before CQL3. But it also provides a slightly more compact

Re: Adding my first node to another one...

2013-08-01 Thread Jonathan Haddad
I recommend you do not add 1.2 nodes to a 1.1 cluster. We tried this, and ran into many issues. Specifically, the data will not correctly stream from the 1.1 nodes to the 1.2, and it will never bootstrap correctly. On Thu, Aug 1, 2013 at 2:07 PM, Morgan Segalis msega...@gmail.com wrote: Hi

Re: CQL and undefined columns

2013-08-05 Thread Jonathan Haddad
...@eventbrite.com wrote: On Wed, Jul 31, 2013 at 3:10 PM, Jonathan Haddad j...@jonhaddad.comwrote: It's advised you do not use compact storage, as it's primarily for backwards compatibility. Many Apache Cassandra experts do not advise against using COMPACT STORAGE. [1] Use CQL3 non-COMPACT STORAGE if you

Re: CQL and undefined columns

2013-08-05 Thread Jonathan Haddad
/BigTable . Cassandra was build on the BigTable/ColumnFamily data model. There was also this big movement called NoSQL, where people wanted to break free of query languages and rigid schema's On Mon, Aug 5, 2013 at 1:56 PM, Jonathan Haddad j...@jonhaddad.com wrote: The CQL docs recommend

Re: CQL and undefined columns

2013-08-05 Thread Jonathan Haddad
, Jonathan Haddad j...@jonhaddad.com wrote: If you expected your CQL3 query to work, then I think you've missed the point of CQL completely. For many of us, adding in a query layer which gives us predictable column names, but continues to allow us to utilize wide rows on disk is a huge benefit. Why

Re: Issue with CQLsh

2013-08-25 Thread Jonathan Haddad
My understanding is that if you want to use CQL, you should create your tables via CQL. Mixing thrift calls w/ CQL seems like it's just asking for problems like this. On Sun, Aug 25, 2013 at 6:53 PM, Vivek Mishra mishra.v...@gmail.com wrote: cassandra 1.2.4 On Mon, Aug 26, 2013 at 2:51 AM,

Re: Cluster Management

2013-08-29 Thread Jonathan Haddad
An alternative to cssh is fabric. It's very flexible in that you can automate almost any repetitive task that you'd send to machines in a cluster, and it's written in python, meaning if you're in AWS you can mix it with boto to automate pretty much anything you want. On Thu, Aug 29, 2013 at

Re: Low Row Cache Request

2013-08-31 Thread Jonathan Haddad
9/12 = .75 It's a rate, not a percentage. On Sat, Aug 31, 2013 at 2:21 PM, Sávio Teles savio.te...@lupa.inf.ufg.br wrote: I'm running one Cassandra node -version 1.2.6- and I *enabled* the *row cache* with *1GB*. But looking the Cassandra metrics on JConsole, *Row Cache

Re: Why don't you start off with a “single small” Cassandra server as you usually do it with MySQL?

2013-09-18 Thread Jonathan Haddad
For future references, a blog post on this topic. http://rustyrazorblade.com/2013/09/cassandra-faq-can-i-start-with-a-single-node/ On Wed, Sep 18, 2013 at 6:38 AM, Michał Michalski mich...@opera.com wrote: You might be interested in this:

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Jonathan Haddad
at 1:19 PM, Jonathan Haddad j...@jonhaddad.comwrote: So, for cqlengine (https://github.com/cqlengine/cqlengine), we're currently using the thrift api to execute CQL until the native driver is out of beta. I'm a little biased in recommending it, since I'm one of the primary authors. If you've

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Jonathan Haddad
seem worth the risks to us. ml On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad j...@jonhaddad.comwrote: So, for cqlengine (https://github.com/cqlengine/cqlengine), we're currently using the thrift api to execute CQL until the native driver is out of beta. I'm a little biased

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Jonathan Haddad
No, 2.7 only. On Tue, Nov 26, 2013 at 3:04 PM, Kumar Ranjan winnerd...@gmail.com wrote: Hi Jonathan - Does cqlengine have support for python 2.6 ? On Tue, Nov 26, 2013 at 4:17 PM, Jonathan Haddad j...@jonhaddad.comwrote: cqlengine supports batch queries, see the docs here: http

Re: cassandra performance problems

2013-12-05 Thread Jonathan Haddad
Do you mean high CPU usage or high load avg? (20 indicates load avg to me). High load avg means the CPU is waiting on something. Check iostat -dmx 1 100 to check your disk stats, you'll see the columns that indicate mb/s read write as well as % utilization. Once you understand the bottleneck

new project - Under Siege

2013-12-05 Thread Jonathan Haddad
I've recently pushed up a new project to github, which we've named Under Siege. It's a java agent for reporting Cassandra metrics to statsd. We've in the process of deploying it to our production clusters. Tested against Cassandra 1.2.11. The metrics library seems to change on every release of

Re: cassandra backup

2013-12-06 Thread Jonathan Haddad
I believe SSTables are written to a temporary file then moved. If I remember correctly, tools like tablesnap listen for the inotify event IN_MOVED_TO. This should handle the try to back up sstable while in mid-write issue. On Fri, Dec 6, 2013 at 5:39 AM, Michael Theroux mthero...@yahoo.com

Re: Cassandra ring not behaving like a ring

2014-01-16 Thread Jonathan Haddad
Please include the output of nodetool ring, otherwise no one can help you. On Thu, Jan 16, 2014 at 12:45 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Any pointers? I am planning to do rolling restart of the cluster nodes to see if it will help. On Jan 15, 2014 2:59 PM, Narendra

Re: Cassandra ring not behaving like a ring

2014-01-16 Thread Jonathan Haddad
, I am new to Cassandra Environment, does the order of the ring matter, as long as the member joins the group? Yogi On Thu, Jan 16, 2014 at 12:49 PM, Jonathan Haddad j...@jonhaddad.comwrote: Please include the output of nodetool ring, otherwise no one can help you. On Thu, Jan 16, 2014

Re: Recommended OS

2014-02-12 Thread Jonathan Haddad
I just would advise against it because it's going to be difficult to narrow down what's causing problems. For instance, if you have Node A which is performing GC, it will affect query times on Node B which is trying to satisfy a quorum read. Node B might actually have very low load, and it will

abusing cassandra's multi DC abilities

2014-02-21 Thread Jonathan Haddad
Upfront TLDR: We want to do stuff (reindex documents, bust cache) when changed data from DC1 shows up in DC2. Full Story: We're planning on adding data centers throughout the US. Our platform is used for business communications. Each DC currently utilizes elastic search and redis. A message

abusing cassandra's multi DC abilities

2014-02-22 Thread Jonathan Haddad
Upfront TLDR: We want to do stuff (reindex documents, bust cache) when changed data from DC1 shows up in DC2. Full Story: We're planning on adding data centers throughout the US. Our platform is used for business communications. Each DC currently utilizes elastic search and redis. A message

Re: abusing cassandra's multi DC abilities

2014-02-24 Thread Jonathan Haddad
On Saturday, February 22, 2014, Jonathan Haddad j...@jonhaddad.com wrote: Upfront TLDR: We want to do stuff (reindex documents, bust cache) when changed data from DC1 shows up in DC2. Full Story: We're planning on adding data centers throughout the US. Our platform is used for business communications

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad
I have a nagging memory of reading about issues with virtualization and not actually having durable versions of your data even after an fsync (within the VM). Googling around lead me to this post: http://petercai.com/virtualization-is-bad-for-database-integrity/ It's possible you're hitting this

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad
have encountered. Does Cassandra quiesce the file system after a snapshot using fsfreeze or xfs_freeze? Somehow I doubt it... On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad j...@jonhaddad.comwrote: I have a nagging memory of reading about issues with virtualization and not actually having

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad
. On Fri, Mar 28, 2014 at 1:32 PM, Laing, Michael michael.la...@nytimes.comwrote: +1 for tablesnap On Fri, Mar 28, 2014 at 4:28 PM, Jonathan Haddad j...@jonhaddad.comwrote: I will +1 the recommendation on using tablesnap over EBS. S3 is at least predictable. Additionally, from a practical

Re: Tune cache MB settings per table.

2014-06-01 Thread Jonathan Haddad
I think of all the areas you could spend your time, this will have the least returns. The OS will keep the most frequently used data in memory. There's no reason to require cassandra to do it. If you're curious as to what's been loaded into ram, try Al Tobey's pcstat utility.

Re: Customized Compaction Strategy: Dev Questions

2014-06-04 Thread Jonathan Haddad
I'd suggest creating 1 table per day, and dropping the tables you don't need once you're done. On Wed, Jun 4, 2014 at 10:44 AM, Redmumba redmu...@gmail.com wrote: Sorry, yes, that is what I was looking to do--i.e., create a TopologicalCompactionStrategy or similar. On Wed, Jun 4, 2014 at

Re: Bad Request: Type error: cannot assign result of function token (type bigint) to id (type int)

2014-06-05 Thread Jonathan Haddad
You should read through the token docs, it has examples and specifications: http://cassandra.apache.org/doc/cql3/CQL.html#tokenFun On Thu, Jun 5, 2014 at 10:22 PM, Kevin Burton bur...@spinn3r.com wrote: I'm building a new schema which I need to read externally by paging through the result

Re: Bad Request: Type error: cannot assign result of function token (type bigint) to id (type int)

2014-06-05 Thread Jonathan Haddad
Sorry, the datastax docs are actually a bit better: http://www.datastax.com/documentation/cql/3.0/cql/cql_using/paging_c.html Jon On Thu, Jun 5, 2014 at 10:46 PM, Jonathan Haddad j...@jonhaddad.com wrote: You should read through the token docs, it has examples and specifications: http

Re: VPC AWS

2014-06-06 Thread Jonathan Haddad
This may not help you with the migration, but it may with maintenance management. I just put up a blog post on managing VPC security groups with a tool I open sourced at my previous company. If you're going to have different VPCs (staging / prod), it might help with managing security groups.

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad
Your other option is to fire off async queries. It's pretty straightforward w/ the java or python drivers. On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I was taking a look at Cassandra anti-patterns list:

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad
? What would be the advantage? []s 2014-06-19 22:01 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: Your other option is to fire off async queries. It's pretty straightforward w/ the java or python drivers. On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad
the driver reuses the connection? []s 2014-06-19 22:16 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: If you use async and your driver is token aware, it will go to the proper node, rather than requiring the coordinator to do so. Realistically you're going to have a connection open to every

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jonathan Haddad
, I am using the CQL datastax drivers. It was a good advice, thanks a lot Janathan. []s 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: The only case in which it might be better to use an IN clause is if the entire query can be satisfied from that machine. Otherwise, go async

Re: Adding large text blob causes read timeout...

2014-06-24 Thread Jonathan Haddad
Can you do you query in the cli after setting tracing on? On Mon, Jun 23, 2014 at 11:32 PM, DuyHai Doan doanduy...@gmail.com wrote: Yes but adding the extra one ends up by * 1000. The limit in CQL3 specifies the number of logical rows, not the number of physical columns in the storage engine

Re: Triggers and their use in data indexing

2014-07-03 Thread Jonathan Haddad
Triggers only execute on the local coordinator. I would also not recommend using them. On Thu, Jul 3, 2014 at 9:58 AM, Robert Coli rc...@eventbrite.com wrote: On Thu, Jul 3, 2014 at 4:41 AM, Bèrto ëd Sèra berto.d.s...@gmail.com wrote: Now the question: is there any way to use triggers so

Re: Triggers and their use in data indexing

2014-07-03 Thread Jonathan Haddad
AM, Jonathan Haddad j...@jonhaddad.com wrote: Triggers only execute on the local coordinator. I would also not recommend using them. On Thu, Jul 3, 2014 at 9:58 AM, Robert Coli rc...@eventbrite.com wrote: On Thu, Jul 3, 2014 at 4:41 AM, Bèrto ëd Sèra berto.d.s...@gmail.com wrote: Now

Re: Write Inconsistency to update a row

2014-07-03 Thread Jonathan Haddad
Did you make sure all the nodes are on the same time? If they're not, you'll get some weird results. On Thu, Jul 3, 2014 at 10:30 AM, Sávio S. Teles de Oliveira savio.te...@cuia.com.br wrote: Are you sure all the nodes are working at that time? Yes. They are working. I would suggest

Re: Write Inconsistency to update a row

2014-07-03 Thread Jonathan Haddad
Make sure you've got ntpd running, otherwise this will be an ongoing nightmare. On Thu, Jul 3, 2014 at 5:00 PM, Sávio S. Teles de Oliveira savio.te...@cuia.com.br wrote: I have synchronized the clocks and works! 2014-07-03 20:58 GMT-03:00 Sávio S. Teles de Oliveira savio.te...@cuia.com.br:

Re: Cassandra use cases/Strengths/Weakness

2014-07-08 Thread Jonathan Haddad
I've used various databases in production for over 10 years. Each has strengths and weaknesses. I ran Cassandra for just shy of 2 years in production as part of both development teams and operations, and I only hit 1 serious problem that Rob mentioned. Ideally C* would have guarded against it,

Re: horizontal query scaling issues follow on

2014-07-17 Thread Jonathan Haddad
The problem with starting without vnodes is moving to them is a bit hairy. In particular, nodetool shuffle has been reported to take an extremely long time (days, weeks). I would start with vnodes if you have any intent on using them. On Thu, Jul 17, 2014 at 6:03 PM, Robert Coli

Re: map reduce for Cassandra

2014-07-21 Thread Jonathan Haddad
Hey Marcelo, You should check out spark. It intelligently deals with a lot of the issues you're mentioning. Al Tobey did a walkthrough of how to set up the OSS side of things here: http://tobert.github.io/post/2014-07-15-installing-cassandra-spark-stack.html It'll be less work than writing a

Re: map reduce for Cassandra

2014-07-21 Thread Jonathan Haddad
, python + Cassandra will be supported just in the next version, but I would like to be wrong... Best regards, Marcelo Valle. 2014-07-21 13:06 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: Hey Marcelo, You should check out spark. It intelligently deals with a lot of the issues you're

Re: cluster rebalancing…

2014-07-22 Thread Jonathan Haddad
You don't need to specify tokens. The new node gets them automatically. On Jul 22, 2014, at 7:03 PM, Kevin Burton bur...@spinn3r.com wrote: So , shouldn't it be easy to rebalance a cluster? I'm not super excited to type out 200 commands to move around individual tokens. I realize

Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jonathan Haddad
This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. For each token, replicas are chosen based on the strategy. Essentially, you could have a wild imbalance in token ownership, but it wouldn't matter because the replicas would be distributed across the

Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jonathan Haddad
* When I say wild imbalance, I do not mean all tokens on 1 node in the cluster, I really should have said slightly imbalanced On Tue, Aug 5, 2014 at 8:43 AM, Jonathan Haddad j...@jonhaddad.com wrote: This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks

Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jonathan Haddad
// due to carefully chosen tokens vs randomly-generated token clash. I don't see other options left. Do you see other ones ? Regards, Dominique -Message d'origine- De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part de Jonathan Haddad Envoyé : mardi 5

Re: too many open files

2014-08-09 Thread Jonathan Haddad
It really doesn't need to be this complicated. You only need 1 session per application. It's thread safe and manages the connection pool for you. http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/Session.html On Sat, Aug 9, 2014 at 1:29 PM, Kevin Burton bur...@spinn3r.com

Re: Table not being created but no error.

2014-08-13 Thread Jonathan Haddad
Can you provide the code that you use to create the table? This feels like code error rather than a database bug. On Wed, Aug 13, 2014 at 1:26 PM, Kevin Burton bur...@spinn3r.com wrote: 2.0.5… I'm upgrading to 2.0.9 now just to rule this out…. I can give you the full CQL for the table, but

Re:

2014-08-25 Thread Jonathan Haddad
It sounds like your clocks are out of sync. Run ntpdate to fix your clock then make sure you're running ntpd on every machine. On Mon, Aug 25, 2014 at 1:25 PM, Sávio S. Teles de Oliveira savio.te...@cuia.com.br wrote: We're using cassandra 2.0.9 with datastax java cassandra driver 2.0.0 in a

Re:

2014-08-25 Thread Jonathan Haddad
This is actually a more correct response than mine, I made a few assumptions that may or may not be true. On Mon, Aug 25, 2014 at 1:31 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Aug 25, 2014 at 1:25 PM, Sávio S. Teles de Oliveira savio.te...@cuia.com.br wrote: We're using cassandra

Re: Failed to enable shuffling error

2014-09-08 Thread Jonathan Haddad
I believe shuffle has been removed recently. I do not recommend using it for any reason. If you really want to go vnodes, your only sane option is to add a new DC that uses vnodes and switch to it. The downside in the 2.0.x branch to using vnodes is that repairs take N times as long, where N is

Re: Failed to enable shuffling error

2014-09-08 Thread Jonathan Haddad
not work at all. On Mon, Sep 8, 2014 at 2:01 PM, Tim Heckman t...@pagerduty.com wrote: On Mon, Sep 8, 2014 at 1:45 PM, Jonathan Haddad j...@jonhaddad.com wrote: I believe shuffle has been removed recently. I do not recommend using it for any reason. We're still using the 1.2.x branch

Re: multi datacenter replication

2014-09-10 Thread Jonathan Haddad
Multi-dc is available in every version of Cassandra. On Wed, Sep 10, 2014 at 9:21 AM, Oleg Ruchovets oruchov...@gmail.com wrote: Thank you very much for the links. Just to be sure: is this capability available for COMMUNITY ADDITION? Thanks Oleg. On Wed, Sep 10, 2014 at 11:49 PM, Alain

Re: Concurrents deletes and updates

2014-09-17 Thread Jonathan Haddad
Make sure your clocks are synced. If they aren't, the writetime that determines the most recent value will be incorrect. On Wed, Sep 17, 2014 at 11:58 AM, Robert Coli rc...@eventbrite.com wrote: On Wed, Sep 17, 2014 at 11:55 AM, Sávio S. Teles de Oliveira savio.te...@cuia.com.br wrote: I'm

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jonathan Haddad
Keep in mind secondary indexes in cassandra are not there to improve performance, or even really be used in a serious user facing manner. Build and maintain your own view of the data, it'll be much faster. On Thu, Sep 18, 2014 at 6:33 PM, Jay Patel pateljay3...@gmail.com wrote: Hi there, We

Re: Blocking while a node finishes joining the cluster after restart.

2014-09-19 Thread Jonathan Haddad
Depending on how you query (one or quorum) you might be able to do 1 rack at a time (or az or whatever you've got) assuming your snitch is set up right On Sep 19, 2014, at 11:30 AM, Kevin Burton bur...@spinn3r.com wrote: This is great feedback… I think it could actually be even easier

Re: Difference in retrieving data from cassandra

2014-09-25 Thread Jonathan Haddad
You'll need to provide a bit of information. To start, a query trace from would be helpful. http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html (self promo) You may want to read over my blog post regarding diagnosing problems in production. I've covered diagnosing

Re: Repair taking long time

2014-09-26 Thread Jonathan Haddad
Are you using Cassandra 2.0 vnodes? If so, repair takes forever. This problem is addressed in 2.1. On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux gene.robich...@match.com wrote: I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in another. Running a repair on a

Re: Repair taking long time

2014-09-26 Thread Jonathan Haddad
Subject: Re: Repair taking long time Unfortunately DSE 4.5.0 is still on 2.0.x -- Brice On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad j...@jonhaddad.com wrote: Are you using Cassandra 2.0 vnodes? If so, repair takes forever. This problem is addressed in 2.1. On Fri, Sep 26

Re: Repair taking long time

2014-09-26 Thread Jonathan Haddad
: Brice Dutheil [mailto:brice.duth...@gmail.com] Sent: Friday, September 26, 2014 12:47 PM To: user@cassandra.apache.org Subject: Re: Repair taking long time Unfortunately DSE 4.5.0 is still on 2.0.x -- Brice On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad j...@jonhaddad.com wrote

Re: Performance Issue: Keeping rows in memory

2014-10-22 Thread Jonathan Haddad
First, did you run a query trace? I recommend Al Tobey's pcstat util to determine if your files are in the buffer cache: https://github.com/tobert/pcstat On Wed, Oct 22, 2014 at 4:34 AM, Thomas Whiteway thomas.white...@metaswitch.com wrote: Hi, I’m working on an application using a

Re: Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

2014-10-22 Thread Jonathan Haddad
No. Consider a scenario where you supply a timestamp a week in the future, flush it to sstable, and then do a write, with the current timestamp. The record in disk will have a timestamp greater than the one in the memtable. On Wed, Oct 22, 2014 at 9:18 AM, Donald Smith

Re: OOM at Bootstrap Time

2014-10-26 Thread Jonathan Haddad
If the issue is related to I/O, you're going to want to determine if you're saturated. Take a look at `iostat -dmx 1`, you'll see avgqu-sz (queue size) and svctm, (service time).The higher those numbers are, the most overwhelmed your disk is. On Sun, Oct 26, 2014 at 12:01 PM, DuyHai Doan

Re: read after write inconsistent even on a one node cluster

2014-11-06 Thread Jonathan Haddad
For cqlengine we do quite a bit of write then read to ensure data was written correctly, across 1.2, 2.0, and 2.1. For what it's worth, I've never seen this issue come up. On a single node, Cassandra only acks the write after it's been written into the memtable. So, you'd expect to see the most

Re: query tracing

2014-11-07 Thread Jonathan Haddad
Personally I've found that using query timing + log aggregation on the client side is more effective than trying to mess with tracing probability in order to find a single query which has recently become a problem. I recommend wrapping your session with something that can automatically log the

Re: PHP - Cassandra integration

2014-11-11 Thread Jonathan Haddad
In production? On Mon Nov 10 2014 at 6:06:41 AM Spencer Brown lilspe...@gmail.com wrote: I'm using /McFrazier/PhpBinaryCql/ On Mon, Nov 10, 2014 at 1:48 AM, Akshay Ballarpure akshay.ballarp...@tcs.com wrote: Hello, I am working on PHP cassandra integration, please let me know which

Re: Cassandra sort using updatable query

2014-11-12 Thread Jonathan Haddad
With Cassandra you're going to want to model tables to meet the requirements of your queries instead of like a relational database where you build tables in 3NF then optimize after. For your optimized select query, your table (with caveat, see below) could start out as: create table words (

Re: Is it more performant to split data with the same schema into multiple keyspaces, as supposed to put all of them into the same keyspace?

2014-11-13 Thread Jonathan Haddad
Performance will be the same. There's no performance benefit to using multiple keyspaces. On Thu Nov 13 2014 at 8:42:40 AM Li, George guangxing...@pearson.com wrote: Hi, we use Cassandra to store some association type of data. For example, store user to course (course registrations)

Re: Is it more performant to split data with the same schema into multiple keyspaces, as supposed to put all of them into the same keyspace?

2014-11-13 Thread Jonathan Haddad
with (potentially) two separate read patterns, don't put them in the same table. On Thu, Nov 13, 2014 at 11:08 AM, Jonathan Haddad j...@jonhaddad.com wrote: Performance will be the same. There's no performance benefit to using multiple keyspaces. On Thu Nov 13 2014 at 8:42:40 AM Li, George guangxing

Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Jonathan Haddad
If he deletes all the data with RF=1, won't he have data loss? On Mon Nov 17 2014 at 5:14:23 PM Michael Shuler mich...@pbandjelly.org wrote: On 11/17/2014 02:04 PM, Alain Vandendorpe wrote: Hey all, For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup. This is being

Re: Using Cassandra for session tokens

2014-12-01 Thread Jonathan Haddad
I don't think DateTiered will help here, since there's no clustering key defined. This is a pretty straightforward workload, I've done something similar. Are you overwriting the session on every request? Or just writing it once? On Mon Dec 01 2014 at 6:45:14 AM Matt Brown m...@mattnworb.com

Re: Using Cassandra for session tokens

2014-12-01 Thread Jonathan Haddad
- Hash: SHA1 The session will be written once at create time, and never modified after that. Will that affect things? Thank you - -Phil On 01.12.2014 15:58, Jonathan Haddad wrote: I don't think DateTiered will help here, since there's no clustering key defined. This is a pretty

Re: full gc too often

2014-12-04 Thread Jonathan Haddad
I recommend reading through https://issues.apache.org/jira/browse/CASSANDRA-8150 to get an idea of how the JVM GC works and what you can do to tune it. Also good is Blake Eggleston's writeup which can be found here: http://blakeeggleston.com/cassandra-tuning-the-jvm-for-read-heavy-workloads.html

Re: Could ring cache really improve performance in Cassandra?

2014-12-07 Thread Jonathan Haddad
What's a ring cache? FYI if you're using the DataStax CQL drivers they will automatically route requests to the correct node. On Sun Dec 07 2014 at 12:59:36 AM kong kongjiali...@gmail.com wrote: Hi, I'm doing stress test on Cassandra. And I learn that using ring cache can improve the

Re: full gc too often

2014-12-07 Thread Jonathan Haddad
If you've got a specific question I think someone can find a way to help, but asking what can 8gb of heap give me is pretty abstract and unanswerable. Jon On Sun Dec 07 2014 at 8:03:53 AM Philo Yang ud1...@gmail.com wrote: 2014-12-05 15:40 GMT+08:00 Jonathan Haddad j...@jonhaddad.com: I recommend

Re: How to model data to achieve specific data locality

2014-12-07 Thread Jonathan Haddad
I think he mentioned 100MB as the max size - planning for 1mb might make your data model difficult to work. On Sun Dec 07 2014 at 12:07:47 PM Kai Wang dep...@gmail.com wrote: Thanks for the help. I wasn't clear how clustering column works. Coming from Thrift experience, it took me a while to

Re: Could ring cache really improve performance in Cassandra?

2014-12-07 Thread Jonathan Haddad
you very much. 2014-12-08 1:28 GMT+08:00 Jonathan Haddad j...@jonhaddad.com: What's a ring cache? FYI if you're using the DataStax CQL drivers they will automatically route requests to the correct node. On Sun Dec 07 2014 at 12:59:36 AM kong kongjiali...@gmail.com wrote: Hi, I'm doing

Re: Can not connect with cqlsh to something different than localhost

2014-12-08 Thread Jonathan Haddad
Listen address needs the actual address, not the interface. This is best accomplished by setting up proper hostnames for each machine (through DNS or hosts file) and leaving listen_address blank, as it will pick the external ip. Otherwise, you'll need to set the listen address to the IP of the

Re: Could ring cache really improve performance in Cassandra?

2014-12-08 Thread Jonathan Haddad
results, I cannot use it in my case. I will create a repo and send a link later, hope to get your kind help. Thanks very much. 2014-12-08 14:28 GMT+08:00 Jonathan Haddad j...@jonhaddad.com: I would really not recommend using thrift for anything at this point, including your load tests. Take

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Jonathan Haddad
You don't need a prime number of nodes in your ring, but it's not a bad idea to it be a multiple of your RF when your cluster is small. On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder n...@whistle.com wrote: Hi Ian, Thanks for the suggestion but I had actually already done that prior to the

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Jonathan Haddad
...@whistle.com On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad j...@jonhaddad.com wrote: You don't need a prime number of nodes in your ring, but it's not a bad idea to it be a multiple of your RF when your cluster is small. On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder n...@whistle.com wrote

Re: upgrade cassandra from 2.0.6 to 2.1.2

2014-12-09 Thread Jonathan Haddad
Yes. It is, in general, a best practice to upgrade to the latest bug fix release before doing an upgrade to the next point release. On Tue Dec 09 2014 at 6:58:24 PM wyang wy...@v5.cn wrote: I looked some upgrade documentations and am a little puzzled. According to

Re: Cassandra Maintenance Best practices

2014-12-09 Thread Jonathan Haddad
I did a presentation on diagnosing performance problems in production at the US Euro summits, in which I covered quite a few tools preventative measures you should know when running a production cluster. You may find it useful:

Re: batch_size_warn_threshold_in_kb

2014-12-12 Thread Jonathan Haddad
The really important thing to really take away from Ryan's original post is that batches are not there for performance. The only case I consider batches to be useful for is when you absolutely need to know that several tables all get a mutation (via logged batches). The use case for this is when

Re: `nodetool cfhistogram` utility script

2014-12-12 Thread Jonathan Haddad
Hey Jens, Unfortunately the output of the nodetool histograms changes between versions. While I think your script is useful, it's likely to break between versions. You might be interested to weigh in on the JIRA ticket to make the nodetool output machine friendly:

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad
from using server-side distribution of requests. At a minimum the CQL spec should make a more clear statement of intent and non-intent for BATCH. -- Jack Krupansky *From:* Jonathan Haddad j...@jonhaddad.com *Sent:* Friday, December 12, 2014 12:58 PM *To:* user@cassandra.apache.org ; Ryan

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad
of batches but without the coordinator overhead. Can you post your benchmark code? On Sat Dec 13 2014 at 6:10:36 AM Jonathan Haddad j...@jonhaddad.com wrote: There are cases where it can. For instance, if you batch multiple mutations to the same partition (and talk to a replica for that partition

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad
rsvi...@datastax.com wrote: Also..what happens when you turn on shuffle with token aware? http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/policies/TokenAwarePolicy.html On Sat, Dec 13, 2014 at 8:21 AM, Jonathan Haddad j...@jonhaddad.com wrote: To add to Ryan's (extremely

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad
. If reasonable sized batches are causing survivors, you're not far off from falling over anyway. On Sat, Dec 13, 2014 at 10:04 AM, Jonathan Haddad j...@jonhaddad.com wrote: One thing to keep in mind is the overhead of a batch goes up as the number of servers increases. Talking to 3 is going

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad
Not a problem - it's good to hash this stuff out and understand the technical reasons why something works or doesn't work. On Sat Dec 13 2014 at 10:07:10 AM Jonathan Haddad j...@jonhaddad.com wrote: On Sat Dec 13 2014 at 10:00:16 AM Eric Stevens migh...@gmail.com wrote: Isn't the net effect

Re: batch_size_warn_threshold_in_kb

2014-12-15 Thread Jonathan Haddad
13, 2014 at 11:07 AM, Jonathan Haddad j...@jonhaddad.com wrote: On Sat Dec 13 2014 at 10:00:16 AM Eric Stevens migh...@gmail.com wrote: Isn't the net effect of coordination overhead incurred by batches basically the same as the overhead incurred by RoundRobin or other non-token-aware

Re: bootstrapping manually when auto_bootstrap=false ?

2014-12-18 Thread Jonathan Haddad
I'd consider solving your root problem of people are starting and stopping servers in prod accidentally instead of making Cassandra more difficult to manage operationally. On Thu Dec 18 2014 at 4:04:34 AM Ryan Svihla rsvi...@datastax.com wrote: why auto_bootstrap=false? The documentation even

Re: full gc too oftenvAquin p y l mmm am m

2014-12-18 Thread Jonathan Haddad
This topic comes up quite a bit. Enough, in fact, that I've done a 1 hour webinar on the topic. I cover how the JVM GC works and things you need to consider when tuning it for Cassandra. https://www.youtube.com/watch?v=7B_w6YDYSwA With your specific problem - full GC not reducing the old gen -

Re: simple data movement ?

2014-12-19 Thread Jonathan Haddad
It may be more valuable to set up your test cluster as the same version, and make sure your tokens are the same. then copy over your sstables. you'll have an exact replica of prod you can test your upgrade process. On Fri Dec 19 2014 at 11:04:58 AM Ryan Svihla rsvi...@datastax.com wrote: In

  1   2   3   4   5   >