[ 
https://issues.apache.org/jira/browse/CASSANDRA-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-7217:
--------------------------------------
    Attachment: stub_server.diff

It was easier to stub out the server first than to stub out the client library 
so I did that first. See attached diff. I have ExecuteMessage return a void 
result immediately.

On my setup with the server stubbed out the client node maxes out CPU at 400% 
(4 cores) and does 100k operations/second with 500 threads. I increased to 2000 
threads and utilization reported by top decreased to 270% (don't believe top, 
it's saturated) and throughput decreased to 30k.

At 1000 threads I still get 100k. I do see the drop at 1250 threads. So yes it 
exists, but it's might be an issue with the client library or how the client 
library chooses to present load to the server. I'll dig a bit into how the 
client library works to see if I can explain it.

I personally don't necessarily see this as a bug. If you want to concurrently 
execute more than 1000 requests you should not use thread per request on one 
node. That said we do have an interest in having it work as well as possible 
since people are going to do it anyways and we might as well pave the way 
modulo how much time we want to invest.

I am going to experiment with having stress use two (or N) instances of the 
client library to see if reduced contention in the client will ameliorate the 
drop off at 1250 threads. If that helps it may just be a matter of making sure 
the client library can operate as shared nothing shards internally so it can be 
made to have locality and scale up.

In the past I have found that a single global client instance with global locks 
doesn't scale, but I also had limited success with running multiple instances. 
It helps, but not to the point you get linear scale up.

> Native transport performance (with cassandra-stress) drops precipitously past 
> around 1000 threads
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7217
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7217
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Benedict
>            Assignee: Ariel Weisberg
>              Labels: performance, stress, triaged
>             Fix For: 3.0.1, 3.1
>
>         Attachments: 2000-threads.svg, 500-threads.svg, FakeQuerySystem.java, 
> stub_server.diff
>
>
> This is obviously bad. Let's figure out why it's happening and put a stop to 
> it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to