Re: Could ring cache really improve performance in Cassandra?

Robert Stupp Mon, 08 Dec 2014 03:59:57 -0800

> 
> So the native protocol is an asynchronous protocol? 
Yes.

> I have tried using the stress test tool. But it seems that this tool should 
> run on the same node as one of the Cassandra node(or at least on a node 
> having Cassandra installed)? One I try to run this tool on a separate client 
> instance, I got exceptions thrown.
You should start with „new“ kind of stress testing (using CQL3, using native 
protocol, using prepared statements). Forget about thrift ;)
Start with the example YAML stress file first to learn about it. It allows you 
to configure simultaneous writes and reads that match your workload.
And you do not need to run it on a C* node - but you should think about the 
network between the stress test tool and your cluster.


> The ringcache I found is 
> here:https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/client/RingCache.java
>  
> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/client/RingCache.java>
>  . And I try to implement the similar funcionality in C++. My repo is here: 
> https://github.com/kongjialin/Cassandra 
> <https://github.com/kongjialin/Cassandra> . My idea is that all the requests 
> go to the client-side "ring cache" and be sent to the target Cassandra 
> node(each node is associated with a client pool) to avoid routing between 
> nodes in the cluster.
You can safe yourself a lot of work to implement it "right“ - just use the C++ 
driver. It knows about the native protocol and routes requests to the correct 
nodes. Although you can go into the C++ driver code and look how it works, 
improve it etc. :)
I don’t know anything about the C++ driver - but feel free to post to the 
driver mailing list and/or the #datastax-drivers IRC channel.


> 
> 2014-12-08 16:42 GMT+08:00 Robert Stupp <sn...@snazy.de 
> <mailto:sn...@snazy.de>>:
> cassandra-stress is a great tool to check whether the sizing of your cluster 
> in combination of your data model will fit your production needs. I.e. 
> without the application :) Removing the application removes any possible bugs 
> from the load test. Sure, it’s a necessary step to do it with your 
> application - but I’d recommend to start with the stress test tool first.
> 
> Thrift is a deprecated API. I strongly recommend to use the C++ driver (I 
> pretty sure it supports the native protocol). The native protocol achieves 
> approx. twice the performance than thrift via much fewer TCP connections. 
> (Thrift is RPC - means connections usually waste system, application and 
> server resources while waiting for something. Native protocol is a 
> multiplexed protocol.) As John already said, all development effort is spent 
> on CQL3 and native protocol - thift is just "supported".
> 
> With CQL you can you everything that you can do with thrift + more, new stuff.
> 
> I also recommend to use prepared statements (it automagically works in a 
> distributed cluster with the native protocol) - it eliminates the effort to 
> parse CQL statement again and again.
> 
> 
>> Am 08.12.2014 um 09:26 schrieb 孔嘉林 <kongjiali...@gmail.com 
>> <mailto:kongjiali...@gmail.com>>:
>> 
>> Thanks Jonathan, actually I'm wondering how CQL is implemented underlying, a 
>> different RPC mechanism? Why it is faster than thrift? I know I'm wrong, but 
>> now I just regard CQL as a query language. Could you please help explain to 
>> me? I still feel puzzled after reading some docs about CQL. I create table 
>> in CQL, and use cql3 API in thrift. I don't know what else I can do with 
>> CQL. And I am using C++ to write the client side code. Currently I am not 
>> using the C++ driver and want to write some simple functionality by myself. 
>> 
>> Also, I didn't use the stress test tool provided in the Cassandra 
>> distribution because I also want to make sure whether I can achieve good 
>> performance as excepted using my client code. I know others have benchmarked 
>> Cassandra and got good results. But if I cannot reproduce the satisfactory 
>> results, I cannot use it in my case.
>> 
>> I will create a repo and send a link later, hope to get your kind help.
>> 
>> Thanks very much.
>> 
>> 2014-12-08 14:28 GMT+08:00 Jonathan Haddad <j...@jonhaddad.com 
>> <mailto:j...@jonhaddad.com>>:
>> I would really not recommend using thrift for anything at this point, 
>> including your load tests.  Take a look at CQL, all development is going 
>> there and has in 2.1 seen a massive performance boost over 2.0.
>> 
>> You may want to try the Cassandra stress tool included in 2.1, it can stress 
>> a table you've already built.  That way you can rule out any bugs on the 
>> client side.  If you're going to keep using your tool, however, it would be 
>> helpful if you sent out a link to the repo, since currently we have no way 
>> of knowing if you've got a client side bug (data model or code) that's 
>> limiting your performance.
>> 
>> 
>> On Sun Dec 07 2014 at 7:55:16 PM 孔嘉林 <kongjiali...@gmail.com 
>> <mailto:kongjiali...@gmail.com>> wrote:
>> I find under the src/client folder of Cassandra 2.1.0 source code, there is 
>> a RingCache.java file. It uses a thrift client calling the describe_ring() 
>> API to get the token range of each Cassandra node. It is used on the client 
>> side. The client can use it combined with the partitioner to get the target 
>> node. In this way there is no need to route requests between Cassandra 
>> nodes, and the client can directly connect to the target node. So maybe it 
>> can save some routing time and improve performance.
>> Thank you very much.
>> 
>> 2014-12-08 1:28 GMT+08:00 Jonathan Haddad <j...@jonhaddad.com 
>> <mailto:j...@jonhaddad.com>>:
>> What's a ring cache?
>> 
>> FYI if you're using the DataStax CQL drivers they will automatically route 
>> requests to the correct node.
>> 
>> On Sun Dec 07 2014 at 12:59:36 AM kong <kongjiali...@gmail.com 
>> <mailto:kongjiali...@gmail.com>> wrote:
>> Hi,
>> 
>> I'm doing stress test on Cassandra. And I learn that using ring cache can 
>> improve the performance because the client requests can directly go to the 
>> target Cassandra server and the coordinator Cassandra node is the desired 
>> target node. In this way, there is no need for coordinator node to route the 
>> client requests to the target node, and maybe we can get the linear 
>> performance increment.
>> 
>>  
>> 
>> However, in my stress test on an Amazon EC2 cluster, the test results are 
>> weird. Seems that there's no performance improvement after using ring cache. 
>> Could anyone help me explain this results? (Also, I think the results of 
>> test without ring cache is weird, because there's no linear increment on QPS 
>> when new nodes are added. I need help on explaining this, too). The results 
>> are as follows:
>> 
>>  
>> 
>> INSERT(write):
>> 
>> Node count
>> 
>> Replication factor
>> 
>> QPS(No ring cache)
>> 
>> QPS(ring cache)
>> 
>> 1
>> 
>> 1
>> 
>> 18687
>> 
>> 20195
>> 
>> 2
>> 
>> 1
>> 
>> 20793
>> 
>> 26403
>> 
>> 2
>> 
>> 2
>> 
>> 22498
>> 
>> 21263
>> 
>> 4
>> 
>> 1
>> 
>> 28348
>> 
>> 30010
>> 
>> 4
>> 
>> 3
>> 
>> 28631
>> 
>> 24413
>> 
>>  
>> 
>> SELECT(read):
>> 
>> Node count
>> 
>> Replication factor
>> 
>> QPS(No ring cache)
>> 
>> QPS(ring cache)
>> 
>> 1
>> 
>> 1
>> 
>> 24498
>> 
>> 22802
>> 
>> 2
>> 
>> 1
>> 
>> 28219
>> 
>> 27030
>> 
>> 2
>> 
>> 2
>> 
>> 35383
>> 
>> 36674
>> 
>> 4
>> 
>> 1
>> 
>> 34648
>> 
>> 28347
>> 
>> 4
>> 
>> 3
>> 
>> 52932
>> 
>> 52590
>> 
>>  
>> 
>>  
>> 
>> Thank you very much,
>> 
>> Joy
>> 
>> 
>> 
> 
>

Re: Could ring cache really improve performance in Cassandra?

Reply via email to