[ 
https://issues.apache.org/jira/browse/CASSANDRA-8597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273894#comment-14273894
 ] 

Benedict commented on CASSANDRA-8597:
-------------------------------------

There is a FIXED distribution - if you want exactly 1M, why not use this? With 
a depth of 3, as stated, FIXED(100) for each clustering column would do this 
trick.

If we reenvisage the way we define the distribution, as I alluded to in #2, you 
could define the total number of rows you want in the partition. But then 
conceptualising how those rows are distributed amongst the clustering columns 
becomes hard and a different PITA. You'd need two knobs per clustering column: 
the share of fan-out they should adopt, and the variance between each value. 
Understanding how these interplayed with each other (both intra-tier and 
inter-tier) would be really quite difficult for people to think about, which is 
why I originally chose to let it be configured by clustering column. It does, 
however, also solve your problem #2. It's a more powerful way of specifying, 
but I'm concerned that stress is already considered difficult to understand.

> Stress: make simple things simple
> ---------------------------------
>
>                 Key: CASSANDRA-8597
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8597
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: T Jake Luciani
>             Fix For: 2.1.3
>
>
> Some of the trouble people have with stress is a documentation problem, but 
> some is functional.
> Comments from [~iamaleksey]:
> # 3 clustering columns, make a million cells in a single partition, should be 
> simple, but it's not. have to tweak 'clustering' on the three columns just 
> right to make stress work at all. w/ some values it'd just gets stuck forever 
> computing batches
> # for others, it generates huge, megabyte-size batches, utterly disrespecting 
> 'select' clause in 'insert'
> #  I want a sequential generator too, to be able to predict deterministic 
> result sets. uniform() only gets you so far
> # impossible to simulate a time series workload
> /cc [~jshook] [~aweisberg] [~benedict]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to