Benedict commented on CASSANDRA-7519:

bq. I plan to run some test workloads to double check the logic, but first cut 
of the code looked good. I left a couple comments on the github branch


bq. I'm not very keen on the new labels you've chosen for the insert section of 
the yaml file, They should be more verbose

Nomenclature is always tricky, certainly not fixed on them. Although by making 
these more verbose we'll need to make the command line correspondingly more 
verbose to keep them in sync, which I'm not super keen on, but not too fussed 
about either.

bq. partitions_per_batch maybe?

perhaps partitions_per_operation? because per_batch implies we might change the 
number of partitions between batches, whereas we work with the same partitions 
for the duration of an 'operation' (the n= declared on command line)...

bq. batch_split_count


> Further stress improvements to generate more realistic workloads
> ----------------------------------------------------------------
>                 Key: CASSANDRA-7519
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7519
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Minor
>              Labels: tools
>             Fix For: 2.1.1
> We generally believe that the most common workload is for reads to 
> exponentially prefer most recently written data. However as stress currently 
> behaves we have two id generation modes: sequential and random (although 
> random can be distributed). I propose introducing a new mode which is 
> somewhat like sequential, except we essentially 'look back' from the current 
> id by some amount defined by a distribution. I may possibly make the position 
> only increment as it's first written to also, so that this mode can be run 
> from a clean slate with a mixed workload. This should allow is to generate 
> workloads that are more representative.
> At the same time, I will introduce a timestamp value generator for primary 
> key columns that is strictly ascending, i.e. has some random component but is 
> based off of the actual system time (or some shared monotonically increasing 
> state) so that we can again generate a more realistic workload. This may be 
> challenging to tie in with the new procedurally generated partitions, but I'm 
> sure it can be done without too much difficulty.

This message was sent by Atlassian JIRA

Reply via email to