[jira] [Commented] (CASSANDRA-7519) Further stress improvements to generate more realistic workloads

T Jake Luciani (JIRA) Sun, 17 Aug 2014 19:53:09 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100233#comment-14100233
 ]


T Jake Luciani commented on CASSANDRA-7519:
-------------------------------------------

I'm not very keen on the new labels you've chosen for the insert section of the 
yaml file, They should be more verbose.

Batch size - "number of unique partitions to update in a single operation" 
  This should mention partitions in it no? partitions_per_batch maybe?

Batch count - "number of batches we aim to split the update up into"
   Does this mean the number of batches to split a operation of N partitions 
into? If so, then perhaps batch_split_count?

I plan to run some test workloads to double check the logic, but first cut of 
the code looked good.  I left a couple comments on the github branch


> Further stress improvements to generate more realistic workloads
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-7519
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7519
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Minor
>              Labels: tools
>             Fix For: 2.1.1
>
>
> We generally believe that the most common workload is for reads to 
> exponentially prefer most recently written data. However as stress currently 
> behaves we have two id generation modes: sequential and random (although 
> random can be distributed). I propose introducing a new mode which is 
> somewhat like sequential, except we essentially 'look back' from the current 
> id by some amount defined by a distribution. I may possibly make the position 
> only increment as it's first written to also, so that this mode can be run 
> from a clean slate with a mixed workload. This should allow is to generate 
> workloads that are more representative.
> At the same time, I will introduce a timestamp value generator for primary 
> key columns that is strictly ascending, i.e. has some random component but is 
> based off of the actual system time (or some shared monotonically increasing 
> state) so that we can again generate a more realistic workload. This may be 
> challenging to tie in with the new procedurally generated partitions, but I'm 
> sure it can be done without too much difficulty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7519) Further stress improvements to generate more realistic workloads

Reply via email to