Hi Eric,
Thank you very much for your reply!
Do you mean that I should clear my table after each run? Indeed, I can see
several times of compaction during my test, but could only a few times
compaction affect the performance that much? Also, I can see from the
OpsCenter some ParNew GC happen but no CMS GC happen.

I run my test on EC2 cluster, I think the network could be of high speed
with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD
storage, which is of m3.xlarge type.

As for latency, which latency should I care about most? p(99) or p(999)? I
want to get the max QPS under a certain limited latency.

I know my testing scenario are not the common case in production, I just
want to know how much burden my cluster can bear under stress.

So, how did you test your cluster that can get 86k writes/sec? How many
requests did you send to your cluster? Was it also 1 million? Did you also
use OpsCenter to monitor the real time performance? I also wonder why the
write and read QPS OpsCenter provide are much lower than what I calculate.
Could you please describe in detail about your test deployment?

Thank you very much,
Joy

2014-12-07 23:55 GMT+08:00 Eric Stevens <migh...@gmail.com>:

> Hi Joy,
>
> Are you resetting your data after each test run?  I wonder if your tests
> are actually causing you to fall behind on data grooming tasks such as
> compaction, and so performance suffers for your later tests.
>
> There are *so many* factors which can affect performance, without
> reviewing test methodology in great detail, it's really hard to say whether
> there are flaws which might uncover an antipattern cause atypical number of
> cache hits or misses, and so forth. You may also be producing gc pressure
> in the write path, and so forth.
>
> I *can* say that 28k writes per second looks just a little low, but it
> depends a lot on your network, hardware, and write patterns (eg, data
> size).  For a little performance test suite I wrote, with parallel batched
> writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per
> second.
>
> Also focusing exclusively on max latency is going to cause you some
> troubles especially in the case of magnetic media as you're using.  Between
> ill-timed GC and inconsistent performance characteristics from magnetic
> media, your max numbers will often look significantly worse than your p(99)
> or p(999) numbers.
>
> All this said, one node will often look better than several nodes for
> certain patterns because it completely eliminates proxy (coordinator) write
> times.  All writes are local writes.  It's an over-simple case that doesn't
> reflect any practical production use of Cassandra, so it's probably not
> worth even including in your tests.  I would recommend start at 3 nodes
> rf=3, and compare against 6 nodes rf=6.  Make sure you're staying on top of
> compaction and aren't seeing garbage collections in the logs (either of
> those will be polluting your results with variability you can't account for
> with small sample sizes of ~1 million).
>
> If you expect to sustain write volumes like this, you'll find these
> clusters are sized too small (on that hardware you won't keep up with
> compaction), and your tests are again testing scenarios you wouldn't
> actually see in production.
>
> On Sat Dec 06 2014 at 7:09:18 AM kong <kongjiali...@gmail.com> wrote:
>
>> Hi,
>>
>> I am doing stress test on Datastax Cassandra Community 2.1.2, not using
>> the provided stress test tool, but use my own stress-test client code
>> instead(I write some C++ stress test code). My Cassandra cluster is
>> deployed on Amazon EC2, using the provided Datastax Community AMI( HVM
>> instances ) in the Datastax document, and I am not using EBS, just using
>> the ephemeral storage by default. The EC2 type of Cassandra servers are
>> m3.xlarge. I use another EC2 instance for my stress test client, which is
>> of type r3.8xlarge. Both the Cassandra sever nodes and stress test client
>> node are in us-east. I test the Cassandra cluster which is made up of 1
>> node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT
>> test separately, but the performance doesn’t get linear increment when new
>> nodes are added. Also I get some weird results. My test results are as
>> follows(*I do 1 million operations and I try to get the best QPS when
>> the max latency is no more than 200ms, and the latencies are measured from
>> the client side. The QPS is calculated by total_operations/total_time).*
>>
>>
>>
>> *INSERT(write):*
>>
>> Node count
>>
>> Replication factor
>>
>>   QPS
>>
>> Average latency(ms)
>>
>> Min latency(ms)
>>
>> .95 latency(ms)
>>
>> .99 latency(ms)
>>
>> .999 latency(ms)
>>
>> Max latency(ms)
>>
>> 1
>>
>> 1
>>
>> 18687
>>
>> 2.08
>>
>> 1.48
>>
>> 2.95
>>
>> 5.74
>>
>> 52.8
>>
>> 205.4
>>
>> 2
>>
>> 1
>>
>> 20793
>>
>> 3.15
>>
>> 0.84
>>
>> 7.71
>>
>> 41.35
>>
>> 88.7
>>
>> 232.7
>>
>> 2
>>
>> 2
>>
>> 22498
>>
>> 3.37
>>
>> 0.86
>>
>> 6.04
>>
>> 36.1
>>
>> 221.5
>>
>> 649.3
>>
>> 4
>>
>> 1
>>
>> 28348
>>
>> 4.38
>>
>> 0.85
>>
>> 8.19
>>
>> 64.51
>>
>> 169.4
>>
>> 251.9
>>
>> 4
>>
>> 3
>>
>> 28631
>>
>> 5.22
>>
>> 0.87
>>
>> 18.68
>>
>> 68.35
>>
>> 167.2
>>
>> 288
>>
>>
>>
>> *SELECT(read):*
>>
>> Node count
>>
>> Replication factor
>>
>> QPS
>>
>> Average latency(ms)
>>
>> Min latency(ms)
>>
>> .95 latency(ms)
>>
>> .99 latency(ms)
>>
>> .999 latency(ms)
>>
>> Max latency(ms)
>>
>> 1
>>
>> 1
>>
>> 24498
>>
>> 4.01
>>
>> 1.51
>>
>> 7.6
>>
>> 12.51
>>
>> 31.5
>>
>> 129.6
>>
>> 2
>>
>> 1
>>
>> 28219
>>
>> 3.38
>>
>> 0.85
>>
>> 9.5
>>
>> 17.71
>>
>> 39.2
>>
>> 152.2
>>
>> 2
>>
>> 2
>>
>> 35383
>>
>> 4.06
>>
>> 0.87
>>
>> 9.71
>>
>> 21.25
>>
>> 70.3
>>
>> 215.9
>>
>> 4
>>
>> 1
>>
>> 34648
>>
>> 2.78
>>
>> 0.86
>>
>> 6.07
>>
>> 14.94
>>
>> 30.8
>>
>> 134.6
>>
>> 4
>>
>> 3
>>
>> 52932
>>
>> 3.45
>>
>> 0.86
>>
>> 10.81
>>
>> 21.05
>>
>> 37.4
>>
>> 189.1
>>
>>
>>
>> The test data I use is generated randomly, and the schema I use is like
>> (I use the cqlsh to create the columnfamily/table):
>>
>> CREATE TABLE table(
>>
>> id1  varchar,
>>
>> ts   varchar,
>>
>> id2  varchar,
>>
>> msg  varchar,
>>
>> PRIMARY KEY(id1, ts, id2));
>>
>> So the fields are all string and I generate each character of the string
>> randomly, using srand(time(0)) and rand() in C++, so I think my test data
>> could be uniformly distributed into the Cassandra cluster. And, in my
>> client stress test code, I use thrift C++ interface, and the basic
>> operation I do is like:
>>
>> thrift_client.execute_cql3_query(“INSERT INTO table WHERE id1=xxx,
>> ts=xxx, id2=xxx, msg=xxx”); and thrift_client.execute_cql3_query(“SELECT
>> FROM table WHERE id1=xxx”);
>>
>> Each data entry I INSERT of SELECT is of around 100 characters.
>>
>> On my stress test client, I create several threads to send the read and
>> write requests, each thread having its own thrift client, and at the
>> beginning all the thrift clients connect to the Cassandra servers evenly.
>> For example, I create 160 thrift clients, and each 40 clients of them
>> connect to one server node, in a 4 node cluster.
>>
>>
>>
>> *So, *
>>
>> *1.       **Could anyone help me explain my test results? Why does the
>> performance ( QPS ) just get a little increment when new nodes are added? *
>>
>> *2.       **I learn from the materials that, Cassandra has better write
>> performance than read. But why in my case the read performance is better?*
>>
>> *3.       **I also use the OpsCenter to monitor the real-time
>> performance of my cluster. But when I get the average QPS above, the
>> operations/s provided by OpsCenter is around 10000+ for write peak and
>> 5000+ for read peak.  Why is my result inconsistent with that from
>> OpsCenter?*
>>
>> *4.       **Are there any unreasonable things in my test method, such as
>> test data and QPS calculation?*
>>
>>
>>
>> *Thank you very much,*
>>
>> *Joy*
>>
>

Reply via email to