[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

Daniel Cranford (JIRA) Tue, 03 Oct 2017 11:14:56 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190085#comment-16190085
 ]


Daniel Cranford commented on CASSANDRA-12744:
---------------------------------------------

As I've thought about how to fix the seed multiplier, I've come to the 
conclusion that it is impossible to use an adaptive multiplier without breaking 
existing functionality or changing the command line interface.

One of the key reasons you can specify how the seeds get generated is so that 
you can partition the seed space and run multiple cassandra-stress processes on 
different machines in parallel so the cassandra-stress client doesn't become 
the bottleneck. E.G. to write 2 million partitions from two client machines, 
you'd run {noformat}cassandra-stress write n=1000000 -pop 
seq=1..1000000{noformat} on one client machine and {noformat}cassandra-stress 
write n=1000000 -pop seq=1000001..2000000{noformat} on the other client machine.

An adaptive multiplier that attempts to scale the seed sequence so that it's 
range is 10^22 (or better, Long.MAX_VALUE since seeds are 64 bit longs) would 
generate the same multiplier for both client processes resulting in seed 
sequence overlaps.

To correctly generate an adaptive multiplier, you need global knowledge of the 
entire range of seeds being generated by all cassandra-stress processes. This 
information cannot be supplied via the current command line interface. The 
command line interface would have to be updated in a breaking fashion to 
support an adaptive multiplier.

Using a hardcoded static multiplier is safe, but would reduce the allowable 
range of seed values (and thus reduce the maximum number of distinct partition 
keys). This probably isn't a big deal since nobody wants to write 2^64 
partitions. But it would need to be chosen with care so that the number of 
distinct seeds (and thus the number of distinct partitions) doesn't become too 
small.



> Randomness of stress distributions is not good
> ----------------------------------------------
>
>                 Key: CASSANDRA-12744
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: T Jake Luciani
>            Assignee: Ben Slater
>            Priority: Minor
>              Labels: stress
>             Fix For: 4.0
>
>         Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

Reply via email to