Could ring cache really improve performance in Cassandra?

2014-12-07 Thread kong
Hi, I'm doing stress test on Cassandra. And I learn that using ring cache can improve the performance because the client requests can directly go to the target Cassandra server and the coordinator Cassandra node is the desired target node. In this way, there is no need for coordinator node to

Re: How to model data to achieve specific data locality

2014-12-07 Thread DuyHai Doan
Those sequences are not fixed. All sequences with the same seq_id tend to grow at the same rate. If it's one partition per seq_id, the size will most likely exceed the threshold quickly -- Then use bucketing to avoid too wide partitions Also new seq_types can be added and old seq_types can be

Re: How to model data to achieve specific data locality

2014-12-07 Thread Eric Stevens
Also new seq_types can be added and old seq_types can be deleted. This means I often need to ALTER TABLE to add and drop columns. Kai, unless I'm misunderstanding something, I don't see why you need to alter the table to add a new seq type. From a data model perspective, these are just new

Re: How to model data to achieve specific data locality

2014-12-07 Thread Jack Krupansky
It would be helpful to look at some specific examples of sequences, showing how they grow. I suspect that the term “sequence” is being overloaded in some subtly misleading way here. Besides, we’ve already answered the headline question – data locality is achieved by having a common partition

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread Eric Stevens
Hi Joy, Are you resetting your data after each test run? I wonder if your tests are actually causing you to fall behind on data grooming tasks such as compaction, and so performance suffers for your later tests. There are *so many* factors which can affect performance, without reviewing test

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread Eric Stevens
I'm sorry, I meant to say 6 nodes rf=3. Also look at this performance over sustained periods of times, not burst writing. Run your test for several hours and watch memory and especially compaction stats. See if you can walk in what data volume you can write while keeping outstanding compaction

Re: full gc too often

2014-12-07 Thread Philo Yang
2014-12-05 15:40 GMT+08:00 Jonathan Haddad j...@jonhaddad.com: I recommend reading through https://issues.apache.org/jira/browse/CASSANDRA-8150 to get an idea of how the JVM GC works and what you can do to tune it. Also good is Blake Eggleston's writeup which can be found here:

Re: Could ring cache really improve performance in Cassandra?

2014-12-07 Thread Jonathan Haddad
What's a ring cache? FYI if you're using the DataStax CQL drivers they will automatically route requests to the correct node. On Sun Dec 07 2014 at 12:59:36 AM kong kongjiali...@gmail.com wrote: Hi, I'm doing stress test on Cassandra. And I learn that using ring cache can improve the

Re: full gc too often

2014-12-07 Thread Jonathan Haddad
There's a lot of factors that go into tuning, and I don't know of any reliable formula that you can use to figure out what's going to work optimally for your hardware. Personally I recommend: 1) find the bottleneck 2) playing with a parameter (or two) 3) see what changed, performance wise If

Re: Recommissioned node is much smaller

2014-12-07 Thread Y.Wong
X(__ggyhuiwwbnwvlybb~eg v p o ll As @HHBG XXX. Z MMM Assad ed x x x h h san c'mon c c g g N-Gage u tv za ? ;mm g door h On Dec 2, 2014 3:45 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Dec 2, 2014 at 12:21 PM, Robert Wille rwi...@fold3.com wrote: As a a test, I took down

Re: How to model data to achieve specific data locality

2014-12-07 Thread Kai Wang
Thanks for the help. I wasn't clear how clustering column works. Coming from Thrift experience, it took me a while to understand how clustering column impacts partition storage on disk. Now I believe using seq_type as the first clustering column solves my problem. As of partition size, I will

Re: How to model data to achieve specific data locality

2014-12-07 Thread Jack Krupansky
As a general rule, partitions can certainly be much larger than 1 MB, even up to 100 MB. 5 MB to 10 MB might be a good target size. Originally you stated that the number of seq_types could be “unlimited”... is that really true? Is there no practical upper limit you can establish, like 10,000

Re: How to model data to achieve specific data locality

2014-12-07 Thread Jonathan Haddad
I think he mentioned 100MB as the max size - planning for 1mb might make your data model difficult to work. On Sun Dec 07 2014 at 12:07:47 PM Kai Wang dep...@gmail.com wrote: Thanks for the help. I wasn't clear how clustering column works. Coming from Thrift experience, it took me a while to

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread 孔嘉林
Hi Eric, Thank you very much for your reply! Do you mean that I should clear my table after each run? Indeed, I can see several times of compaction during my test, but could only a few times compaction affect the performance that much? Also, I can see from the OpsCenter some ParNew GC happen but

Re: Could ring cache really improve performance in Cassandra?

2014-12-07 Thread 孔嘉林
I find under the src/client folder of Cassandra 2.1.0 source code, there is a *RingCache.java* file. It uses a thrift client calling the* describe_ring()* API to get the token range of each Cassandra node. It is used on the client side. The client can use it combined with the partitioner to get

Can not connect with cqlsh to something different than localhost

2014-12-07 Thread Richard Snowden
I am running Cassandra 2.1.2 in an Ubuntu VM. cqlsh or cqlsh localhost works fine. But I can not connect from outside the VM (firewall, etc. disabled). Even when I do cqlsh 192.168.111.136 in my VM I get connection refused. This is strange because when I check my network config I can see that

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread Chris Lohfink
I think your client could use improvements. How many threads do you have running in your test? With a thrift call like that you only can do one request at a time per connection. For example, assuming C* takes 0ms, a 10ms network latency/driver overhead will mean 20ms RTT and a max throughput

Re: Can not connect with cqlsh to something different than localhost

2014-12-07 Thread Michael Dykman
Try: $ netstat -lnt and see which interface port 9042 is listening on. You will likely need to update cassandra.yaml to change the interface. By default, Cassandra is listening on localhost so your local cqlsh session works. On Sun, 7 Dec 2014 23:44 Richard Snowden richard.t.snow...@gmail.com

re: UPDATE statement is failed

2014-12-07 Thread 鄢来琼
Hi All, There is a practices for Cassandra UPDATE statement. Maybe is not the best, but it is a reference for you to update a row in high frequency. The Cassandra will be failed if UPDATE statement is executed more than once on the same row. In the end, I change the primary key to let Cassandra

Re: Could ring cache really improve performance in Cassandra?

2014-12-07 Thread Jonathan Haddad
I would really not recommend using thrift for anything at this point, including your load tests. Take a look at CQL, all development is going there and has in 2.1 seen a massive performance boost over 2.0. You may want to try the Cassandra stress tool included in 2.1, it can stress a table