Re: Could ring cache really improve performance in Cassandra?

Robert Stupp Mon, 08 Dec 2014 00:43:04 -0800

cassandra-stress is a great tool to check whether the sizing of your cluster in 
combination of your data model will fit your production needs. I.e. without the 
application :) Removing the application removes any possible bugs from the load 
test. Sure, it’s a necessary step to do it with your application - but I’d 
recommend to start with the stress test tool first.


Thrift is a deprecated API. I strongly recommend to use the C++ driver (I 
pretty sure it supports the native protocol). The native protocol achieves 
approx. twice the performance than thrift via much fewer TCP connections. 
(Thrift is RPC - means connections usually waste system, application and server 
resources while waiting for something. Native protocol is a multiplexed 
protocol.) As John already said, all development effort is spent on CQL3 and 
native protocol - thift is just "supported".

With CQL you can you everything that you can do with thrift + more, new stuff.

I also recommend to use prepared statements (it automagically works in a 
distributed cluster with the native protocol) - it eliminates the effort to 
parse CQL statement again and again.


> Am 08.12.2014 um 09:26 schrieb 孔嘉林 <kongjiali...@gmail.com>:
> 
> Thanks Jonathan, actually I'm wondering how CQL is implemented underlying, a 
> different RPC mechanism? Why it is faster than thrift? I know I'm wrong, but 
> now I just regard CQL as a query language. Could you please help explain to 
> me? I still feel puzzled after reading some docs about CQL. I create table in 
> CQL, and use cql3 API in thrift. I don't know what else I can do with CQL. 
> And I am using C++ to write the client side code. Currently I am not using 
> the C++ driver and want to write some simple functionality by myself. 
> 
> Also, I didn't use the stress test tool provided in the Cassandra 
> distribution because I also want to make sure whether I can achieve good 
> performance as excepted using my client code. I know others have benchmarked 
> Cassandra and got good results. But if I cannot reproduce the satisfactory 
> results, I cannot use it in my case.
> 
> I will create a repo and send a link later, hope to get your kind help.
> 
> Thanks very much.
> 
> 2014-12-08 14:28 GMT+08:00 Jonathan Haddad <j...@jonhaddad.com 
> <mailto:j...@jonhaddad.com>>:
> I would really not recommend using thrift for anything at this point, 
> including your load tests.  Take a look at CQL, all development is going 
> there and has in 2.1 seen a massive performance boost over 2.0.
> 
> You may want to try the Cassandra stress tool included in 2.1, it can stress 
> a table you've already built.  That way you can rule out any bugs on the 
> client side.  If you're going to keep using your tool, however, it would be 
> helpful if you sent out a link to the repo, since currently we have no way of 
> knowing if you've got a client side bug (data model or code) that's limiting 
> your performance.
> 
> 
> On Sun Dec 07 2014 at 7:55:16 PM 孔嘉林 <kongjiali...@gmail.com 
> <mailto:kongjiali...@gmail.com>> wrote:
> I find under the src/client folder of Cassandra 2.1.0 source code, there is a 
> RingCache.java file. It uses a thrift client calling the describe_ring() API 
> to get the token range of each Cassandra node. It is used on the client side. 
> The client can use it combined with the partitioner to get the target node. 
> In this way there is no need to route requests between Cassandra nodes, and 
> the client can directly connect to the target node. So maybe it can save some 
> routing time and improve performance.
> Thank you very much.
> 
> 2014-12-08 1:28 GMT+08:00 Jonathan Haddad <j...@jonhaddad.com 
> <mailto:j...@jonhaddad.com>>:
> What's a ring cache?
> 
> FYI if you're using the DataStax CQL drivers they will automatically route 
> requests to the correct node.
> 
> On Sun Dec 07 2014 at 12:59:36 AM kong <kongjiali...@gmail.com 
> <mailto:kongjiali...@gmail.com>> wrote:
> Hi,
> 
> I'm doing stress test on Cassandra. And I learn that using ring cache can 
> improve the performance because the client requests can directly go to the 
> target Cassandra server and the coordinator Cassandra node is the desired 
> target node. In this way, there is no need for coordinator node to route the 
> client requests to the target node, and maybe we can get the linear 
> performance increment.
> 
>  
> 
> However, in my stress test on an Amazon EC2 cluster, the test results are 
> weird. Seems that there's no performance improvement after using ring cache. 
> Could anyone help me explain this results? (Also, I think the results of test 
> without ring cache is weird, because there's no linear increment on QPS when 
> new nodes are added. I need help on explaining this, too). The results are as 
> follows:
> 
>  
> 
> INSERT(write):
> 
> Node count
> 
> Replication factor
> 
> QPS(No ring cache)
> 
> QPS(ring cache)
> 
> 1
> 
> 1
> 
> 18687
> 
> 20195
> 
> 2
> 
> 1
> 
> 20793
> 
> 26403
> 
> 2
> 
> 2
> 
> 22498
> 
> 21263
> 
> 4
> 
> 1
> 
> 28348
> 
> 30010
> 
> 4
> 
> 3
> 
> 28631
> 
> 24413
> 
>  
> 
> SELECT(read):
> 
> Node count
> 
> Replication factor
> 
> QPS(No ring cache)
> 
> QPS(ring cache)
> 
> 1
> 
> 1
> 
> 24498
> 
> 22802
> 
> 2
> 
> 1
> 
> 28219
> 
> 27030
> 
> 2
> 
> 2
> 
> 35383
> 
> 36674
> 
> 4
> 
> 1
> 
> 34648
> 
> 28347
> 
> 4
> 
> 3
> 
> 52932
> 
> 52590
> 
>  
> 
>  
> 
> Thank you very much,
> 
> Joy
> 
> 
>

Re: Could ring cache really improve performance in Cassandra?

Reply via email to