[
https://issues.apache.org/jira/browse/CASSANDRA-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108219#comment-14108219
]
T Jake Luciani commented on CASSANDRA-7519:
-------------------------------------------
Ran some tests and tweaked the schema from the blogpost and things look better.
I do have some further questions/suggestions besides the better names.
- What is the point of batchcount? The point of a batch is to group the
inserts into a single statement for the server, so why would you send multiple
of these sequentially? Even though it's possible I can't think of a realistic
workload that would use it.
- I think it would be helpful to output some information on the partition sizes
and batch sizes for inserts to give people a sense of what their selected
values will do, like:
{code}
Global:
Partitions: Min of X, Max of Y
Rows per partition: Min of X, Max of Y
Per Batch:
Partitions: Min of X, Max of Y
Rows per partition: Min of X, Max of Y
{code}
> Further stress improvements to generate more realistic workloads
> ----------------------------------------------------------------
>
> Key: CASSANDRA-7519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7519
> Project: Cassandra
> Issue Type: Improvement
> Components: Tools
> Reporter: Benedict
> Assignee: Benedict
> Priority: Minor
> Labels: tools
> Fix For: 2.1.1
>
>
> We generally believe that the most common workload is for reads to
> exponentially prefer most recently written data. However as stress currently
> behaves we have two id generation modes: sequential and random (although
> random can be distributed). I propose introducing a new mode which is
> somewhat like sequential, except we essentially 'look back' from the current
> id by some amount defined by a distribution. I may possibly make the position
> only increment as it's first written to also, so that this mode can be run
> from a clean slate with a mixed workload. This should allow is to generate
> workloads that are more representative.
> At the same time, I will introduce a timestamp value generator for primary
> key columns that is strictly ascending, i.e. has some random component but is
> based off of the actual system time (or some shared monotonically increasing
> state) so that we can again generate a more realistic workload. This may be
> challenging to tie in with the new procedurally generated partitions, but I'm
> sure it can be done without too much difficulty.
--
This message was sent by Atlassian JIRA
(v6.2#6252)