[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

Benedict (JIRA) Sun, 09 Oct 2016 10:22:40 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15560318#comment-15560318
 ]


Benedict commented on CASSANDRA-12490:
--------------------------------------

I'm afraid I think this was a terrible idea, and it should probably be rolled 
back.  The example yaml permits its use as a column value seed generator, which 
means the contents of a partition no longer depend on the partition's seed, but 
on the order of visitation.  

For partition and clustering columns (as in the example) this breaks behaviour 
for queries.  Stress no longer knows what records exist (it will generate 
different values to query than it originally wrote).

It also completely breaks any possibility of data validation, which is 
currently supported for thrift and always intended to be extending to CQL to 
improve testing. 

As already mentioned, the -pop seq=1..N mode can be provided on the command 
line for sequentially visiting partitions.  For generating *values* that can 
step forwards with this, the most sensible design (and what had been on the 
cards) is to accept a functional specification that depends on the seed of the 
partition, the simplest being to return 1 when the partition's seed was 1.

> Add sequence distribution type to cassandra stress
> --------------------------------------------------
>
>                 Key: CASSANDRA-12490
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Ben Slater
>            Assignee: Ben Slater
>            Priority: Minor
>             Fix For: 3.10
>
>         Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

Reply via email to