Hi Eric, Thank you very much for your reply! Do you mean that I should clear my table after each run? Indeed, I can see several times of compaction during my test, but could only a few times compaction affect the performance that much? Also, I can see from the OpsCenter some ParNew GC happen but no CMS GC happen.
I run my test on EC2 cluster, I think the network could be of high speed with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD storage, which is of m3.xlarge type. As for latency, which latency should I care about most? p(99) or p(999)? I want to get the max QPS under a certain limited latency. I know my testing scenario are not the common case in production, I just want to know how much burden my cluster can bear under stress. So, how did you test your cluster that can get 86k writes/sec? How many requests did you send to your cluster? Was it also 1 million? Did you also use OpsCenter to monitor the real time performance? I also wonder why the write and read QPS OpsCenter provide are much lower than what I calculate. Could you please describe in detail about your test deployment? Thank you very much, Joy 2014-12-07 23:55 GMT+08:00 Eric Stevens <migh...@gmail.com>: > Hi Joy, > > Are you resetting your data after each test run? I wonder if your tests > are actually causing you to fall behind on data grooming tasks such as > compaction, and so performance suffers for your later tests. > > There are *so many* factors which can affect performance, without > reviewing test methodology in great detail, it's really hard to say whether > there are flaws which might uncover an antipattern cause atypical number of > cache hits or misses, and so forth. You may also be producing gc pressure > in the write path, and so forth. > > I *can* say that 28k writes per second looks just a little low, but it > depends a lot on your network, hardware, and write patterns (eg, data > size). For a little performance test suite I wrote, with parallel batched > writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per > second. > > Also focusing exclusively on max latency is going to cause you some > troubles especially in the case of magnetic media as you're using. Between > ill-timed GC and inconsistent performance characteristics from magnetic > media, your max numbers will often look significantly worse than your p(99) > or p(999) numbers. > > All this said, one node will often look better than several nodes for > certain patterns because it completely eliminates proxy (coordinator) write > times. All writes are local writes. It's an over-simple case that doesn't > reflect any practical production use of Cassandra, so it's probably not > worth even including in your tests. I would recommend start at 3 nodes > rf=3, and compare against 6 nodes rf=6. Make sure you're staying on top of > compaction and aren't seeing garbage collections in the logs (either of > those will be polluting your results with variability you can't account for > with small sample sizes of ~1 million). > > If you expect to sustain write volumes like this, you'll find these > clusters are sized too small (on that hardware you won't keep up with > compaction), and your tests are again testing scenarios you wouldn't > actually see in production. > > On Sat Dec 06 2014 at 7:09:18 AM kong <kongjiali...@gmail.com> wrote: > >> Hi, >> >> I am doing stress test on Datastax Cassandra Community 2.1.2, not using >> the provided stress test tool, but use my own stress-test client code >> instead(I write some C++ stress test code). My Cassandra cluster is >> deployed on Amazon EC2, using the provided Datastax Community AMI( HVM >> instances ) in the Datastax document, and I am not using EBS, just using >> the ephemeral storage by default. The EC2 type of Cassandra servers are >> m3.xlarge. I use another EC2 instance for my stress test client, which is >> of type r3.8xlarge. Both the Cassandra sever nodes and stress test client >> node are in us-east. I test the Cassandra cluster which is made up of 1 >> node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT >> test separately, but the performance doesn’t get linear increment when new >> nodes are added. Also I get some weird results. My test results are as >> follows(*I do 1 million operations and I try to get the best QPS when >> the max latency is no more than 200ms, and the latencies are measured from >> the client side. The QPS is calculated by total_operations/total_time).* >> >> >> >> *INSERT(write):* >> >> Node count >> >> Replication factor >> >> QPS >> >> Average latency(ms) >> >> Min latency(ms) >> >> .95 latency(ms) >> >> .99 latency(ms) >> >> .999 latency(ms) >> >> Max latency(ms) >> >> 1 >> >> 1 >> >> 18687 >> >> 2.08 >> >> 1.48 >> >> 2.95 >> >> 5.74 >> >> 52.8 >> >> 205.4 >> >> 2 >> >> 1 >> >> 20793 >> >> 3.15 >> >> 0.84 >> >> 7.71 >> >> 41.35 >> >> 88.7 >> >> 232.7 >> >> 2 >> >> 2 >> >> 22498 >> >> 3.37 >> >> 0.86 >> >> 6.04 >> >> 36.1 >> >> 221.5 >> >> 649.3 >> >> 4 >> >> 1 >> >> 28348 >> >> 4.38 >> >> 0.85 >> >> 8.19 >> >> 64.51 >> >> 169.4 >> >> 251.9 >> >> 4 >> >> 3 >> >> 28631 >> >> 5.22 >> >> 0.87 >> >> 18.68 >> >> 68.35 >> >> 167.2 >> >> 288 >> >> >> >> *SELECT(read):* >> >> Node count >> >> Replication factor >> >> QPS >> >> Average latency(ms) >> >> Min latency(ms) >> >> .95 latency(ms) >> >> .99 latency(ms) >> >> .999 latency(ms) >> >> Max latency(ms) >> >> 1 >> >> 1 >> >> 24498 >> >> 4.01 >> >> 1.51 >> >> 7.6 >> >> 12.51 >> >> 31.5 >> >> 129.6 >> >> 2 >> >> 1 >> >> 28219 >> >> 3.38 >> >> 0.85 >> >> 9.5 >> >> 17.71 >> >> 39.2 >> >> 152.2 >> >> 2 >> >> 2 >> >> 35383 >> >> 4.06 >> >> 0.87 >> >> 9.71 >> >> 21.25 >> >> 70.3 >> >> 215.9 >> >> 4 >> >> 1 >> >> 34648 >> >> 2.78 >> >> 0.86 >> >> 6.07 >> >> 14.94 >> >> 30.8 >> >> 134.6 >> >> 4 >> >> 3 >> >> 52932 >> >> 3.45 >> >> 0.86 >> >> 10.81 >> >> 21.05 >> >> 37.4 >> >> 189.1 >> >> >> >> The test data I use is generated randomly, and the schema I use is like >> (I use the cqlsh to create the columnfamily/table): >> >> CREATE TABLE table( >> >> id1 varchar, >> >> ts varchar, >> >> id2 varchar, >> >> msg varchar, >> >> PRIMARY KEY(id1, ts, id2)); >> >> So the fields are all string and I generate each character of the string >> randomly, using srand(time(0)) and rand() in C++, so I think my test data >> could be uniformly distributed into the Cassandra cluster. And, in my >> client stress test code, I use thrift C++ interface, and the basic >> operation I do is like: >> >> thrift_client.execute_cql3_query(“INSERT INTO table WHERE id1=xxx, >> ts=xxx, id2=xxx, msg=xxx”); and thrift_client.execute_cql3_query(“SELECT >> FROM table WHERE id1=xxx”); >> >> Each data entry I INSERT of SELECT is of around 100 characters. >> >> On my stress test client, I create several threads to send the read and >> write requests, each thread having its own thrift client, and at the >> beginning all the thrift clients connect to the Cassandra servers evenly. >> For example, I create 160 thrift clients, and each 40 clients of them >> connect to one server node, in a 4 node cluster. >> >> >> >> *So, * >> >> *1. **Could anyone help me explain my test results? Why does the >> performance ( QPS ) just get a little increment when new nodes are added? * >> >> *2. **I learn from the materials that, Cassandra has better write >> performance than read. But why in my case the read performance is better?* >> >> *3. **I also use the OpsCenter to monitor the real-time >> performance of my cluster. But when I get the average QPS above, the >> operations/s provided by OpsCenter is around 10000+ for write peak and >> 5000+ for read peak. Why is my result inconsistent with that from >> OpsCenter?* >> >> *4. **Are there any unreasonable things in my test method, such as >> test data and QPS calculation?* >> >> >> >> *Thank you very much,* >> >> *Joy* >> >