[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

Daniel Cranford (JIRA) Thu, 28 Sep 2017 09:38:26 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184418#comment-16184418
 ]


Daniel Cranford commented on CASSANDRA-12744:
---------------------------------------------

I think the math on this is broken slightly. The seed multiplier is intended to 
scale all seeds to the 10^22 magnitude. However, seeds (and the multiplier) are 
all stored in 64 bit integers and the math is performed on them is 64 bit math.

10^22 is not representable as a long which has range {noformat}[-(2^63) : 2^63 
- 1] = [-9,223,372,036,854,775,808 : 9,223,372,036,854,775,807]{noformat}

Consider that for sample sizes under 1084, the line that calculates the the 
sample multiplier 
{noformat}this.sampleMultiplier = 1 + Math.round(Math.pow(10D, 22 - 
Math.log10(sampleSize)));{noformat}
will result in a multiplier of Long.MIN_VALUE which when multiplied by any long 
will result in 0 or Long.MIN_VALUE reducing your seeds to two distinct values.

I think using 18 instead of 22 as the target exponent should resolve this issue.

Additionally, I think the seed population size is being incorrectly calculated 
as the range of the revisit distribution (which defaults to uniform(1..1M)). 
However, when running in the default sequential seed mode (without revisits), 
eg {noformat}cassandra-stress write n=100{noformat}, the size of the seed 
population is actually the length of the seed sequence (in this case 100).

And when running with seeds generated from a distribution, eg 
{noformat}cassandra-stress read -pop dist=gaussian(1..250M){noformat} the size 
of the seed population is actually the range of the seed distribution (in this 
case 250 million).


> Randomness of stress distributions is not good
> ----------------------------------------------
>
>                 Key: CASSANDRA-12744
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: T Jake Luciani
>            Assignee: Ben Slater
>            Priority: Minor
>              Labels: stress
>             Fix For: 4.0
>
>         Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

Reply via email to