[
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190085#comment-16190085
]
Daniel Cranford commented on CASSANDRA-12744:
---------------------------------------------
As I've thought about how to fix the seed multiplier, I've come to the
conclusion that it is impossible to use an adaptive multiplier without breaking
existing functionality or changing the command line interface.
One of the key reasons you can specify how the seeds get generated is so that
you can partition the seed space and run multiple cassandra-stress processes on
different machines in parallel so the cassandra-stress client doesn't become
the bottleneck. E.G. to write 2 million partitions from two client machines,
you'd run {noformat}cassandra-stress write n=1000000 -pop
seq=1..1000000{noformat} on one client machine and {noformat}cassandra-stress
write n=1000000 -pop seq=1000001..2000000{noformat} on the other client machine.
An adaptive multiplier that attempts to scale the seed sequence so that it's
range is 10^22 (or better, Long.MAX_VALUE since seeds are 64 bit longs) would
generate the same multiplier for both client processes resulting in seed
sequence overlaps.
To correctly generate an adaptive multiplier, you need global knowledge of the
entire range of seeds being generated by all cassandra-stress processes. This
information cannot be supplied via the current command line interface. The
command line interface would have to be updated in a breaking fashion to
support an adaptive multiplier.
Using a hardcoded static multiplier is safe, but would reduce the allowable
range of seed values (and thus reduce the maximum number of distinct partition
keys). This probably isn't a big deal since nobody wants to write 2^64
partitions. But it would need to be chosen with care so that the number of
distinct seeds (and thus the number of distinct partitions) doesn't become too
small.
> Randomness of stress distributions is not good
> ----------------------------------------------
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
> Issue Type: Bug
> Components: Tools
> Reporter: T Jake Luciani
> Assignee: Ben Slater
> Priority: Minor
> Labels: stress
> Fix For: 4.0
>
> Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad. We are using the
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100
> iterations it's only outputting 3. If you bump it to 10k it hits all 3
> values.
> I made a change to just use the default commons math random generator and now
> see all 3 values for n=10
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]