[jira] [Commented] (CASSANDRA-6199) Improve Stress Tool

Benedict (JIRA) Tue, 29 Oct 2013 14:42:15 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808493#comment-13808493
 ]


Benedict commented on CASSANDRA-6199:
-------------------------------------

For the general case, and with clustering keys, no. It was a mooted 
improvement, but I think 
[6146|https://issues.apache.org/jira/browse/CASSANDRA-6146] is likely to be the 
best way to support this.

If you just want to check wide row performance, there is a feature ported 
forward from the old stress for using the TimeUUID column comparator, which 
inserts an arbitrary number of columns with timestamps as their names. I 
haven't played with it myself, but it might suit your purposes for now.



> Improve Stress Tool
> -------------------
>
>                 Key: CASSANDRA-6199
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6199
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Minor
>         Attachments: new.read.latency.svg, new.read.rate.distribution.svg, 
> new.write.latency.svg, new.write.rate.distribution.svg, old.read.latency.svg, 
> old.read.rate.distribution.svg, old.write.latency.svg, 
> old.write.rate.distribution.svg, ops.read.svg, ops.write.svg
>
>
> The stress tool could do with sprucing up. The following is a list of 
> essential improvements and things that would be nice to have.
> Essential:
> - Reduce variability of results, especially start/end tails. Do not trash 
> first/last 10% of readings
> - Reduce contention/overhead in stress to increase overall throughput
> - Short warm-up period, which is ignored for summary (or summarised 
> separately), though prints progress as usual. Potentially automatic detection 
> of rate levelling.
> - Better configurability and defaults for data generation - current column 
> generation populates columns with the same value for every row, which is very 
> easily compressible. Possibly introduce partial random data generator 
> (possibly dictionary-based random data generator)
> Nice to have:
> - Calculate and print stdev and mean
> - Add batched sequential access mode (where a single thread performs 
> batch-size sequential requests before selecting another random key) to test 
> how key proximity affects performance
> - Auto-mode which attempts to establish the maximum throughput rate, by 
> varying the thread count (or otherwise gating the number of parallel 
> requests) for some period, then configures rate limit or thread count to test 
> performance at e.g. 30%, 50%, 70%, 90%, 120%, 150% and unconstrained.
> - Auto-mode could have a target variance ratio for mean throughput and/or 
> latency, and completes a test once this target is hit for x intervals
> - Fix key representation so independent of number of keys (possibly switch to 
> 10 digit hex), and don't use String.format().getBytes() to construct it 
> (expensive)
> Also, remove the skip-key setting, as it is currently ignored. Unless 
> somebody knows the reason for it.
> - Fix latency stats
> - Read/write mode, with configurable recency-of-reads distribution
> - Add new exponential/extreme value distribution for value size, column count 
> and recency-of-reads
> - Support more than 2^31 keys
> - Supports multiple concurrent stress inserts via key-offset parameter or 
> similar



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6199) Improve Stress Tool

Reply via email to