[jira] [Comment Edited] (CASSANDRA-12744) Randomness of stress distributions is not good

Ben Slater (JIRA) Tue, 30 May 2017 00:19:53 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028803#comment-16028803
 ]


Ben Slater edited comment on CASSANDRA-12744 at 5/30/17 7:18 AM:
-----------------------------------------------------------------

After some more digging, I've come to the conclusion that the issue is that the 
JDKRandomGenerator creates close random numbers when seeded with close values. 
So, when running with a small range of potential seeds (from the population) 
you end up with different random doubles which all round to the same long 
value. 

The attached patch multiplies the generated seed so that max seed values are of 
the order of 10^22. I've tested this against a couple of the failed dtests and 
pass OK. In addition, I get the following results from a range of YAML files 
(without multiplier result is unmodified trunk, with multiplier is with this 
patch applied):

Example 1:
table: test5
table_definition: |
  CREATE TABLE test5 (
        pk int,
        val text,
        PRIMARY KEY (pk)
  ) 
columnspec:
  - name: pk
    size: fixed(64) 
    population: uniform(1..500) 
    
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup  -rate threads=5 -node 
127.0.0.1
without multiplier - 47 rows
with multiplier - 490 rows

================================

table: test4
table_definition: |
  CREATE TABLE test4 (
        pk int,
        pk2 text,
        val text,
        PRIMARY KEY ((pk,pk2))
  ) 
columnspec:
  - name: pk
    size: fixed(2) 
    population: uniform(1..5) 
  - name: pk2
    size: fixed(2) 
    population: uniform(1..5) 
    
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup  -rate threads=5 -node 
127.0.0.1
without multipler - 1 row
with multiplier - 25 rows

================================

table: test4
table_definition: |
  CREATE TABLE test4 (
        pk int,
        pk2 text,
        val text,
        PRIMARY KEY ((pk,pk2))
  ) 
columnspec:
  - name: pk
    size: fixed(2) 
    population: uniform(1..500M) 
  - name: pk2
    size: fixed(2) 
    population: uniform(1..5) 
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup  -rate threads=5 -node 
127.0.0.1
without multipler - 1000 row
with multiplier - 1000 rows

===================================
table: test7
table_definition: |
  CREATE TABLE test7 (
        pk int,
        pk2 text,
        ck1 text,
        val text,
        PRIMARY KEY ((pk,pk2), ck1)
  ) 
columnspec:
  - name: pk
    size: fixed(2) 
    population: uniform(1..100) 
  - name: pk2
    size: fixed(4) 
    population: uniform(1..10000) 
  - name: pk2
    size: fixed(4) 
    population: uniform(1..1000) 

user profile=... ops(insert=1) n=100000 cl=ALL no-warmup  -rate threads=5 -node 
127.0.0.1
without multipler - 10342 row
with multiplier - 63387 rows

=====================================
table_definition: |
  CREATE TABLE test7 (
        pk int,
        pk2 text,
        ck1 text,
        val text,
        PRIMARY KEY ((pk,pk2), ck1)
  ) 
columnspec:
  - name: pk
    size: fixed(4) 
    population: seq(1..100) 
  - name: pk2
    size: fixed(10) 
    population: seq(1..10000) 
  - name: pk2
    size: fixed(10) 
    cluster: uniform(1..1000)
    population: seq(1..1000) 

user profile=... ops(insert=1) n=100000 cl=ALL no-warmup  -rate threads=5 -node 
127.0.0.1
without multiplier - 25000 row
with multiplier - 43304 rows



 


was (Author: slater_ben):
After some more digging, I've come to the conclusion that the issue is that the 
JDKRandomGenerator creates close random numbers when seeded with close values. 
So, when running with a small range of potential seeds (from the population) 
you end up with different random doubles which all round to the same long 
value. 

The attached patch multiplies the generated seed so that max seed values are of 
the order of 10^22. I've tested this against a couple of the failed dtests and 
pass OK. In addition, I get the following results from a range of YAML files:

Example 1:
table: test5
table_definition: |
  CREATE TABLE test5 (
        pk int,
        val text,
        PRIMARY KEY (pk)
  ) 
columnspec:
  - name: pk
    size: fixed(64) 
    population: uniform(1..500) 
    
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup  -rate threads=5 -node 
127.0.0.1
without multiplier - 47 rows
with multiplier - 490 rows

================================

table: test4
table_definition: |
  CREATE TABLE test4 (
        pk int,
        pk2 text,
        val text,
        PRIMARY KEY ((pk,pk2))
  ) 
columnspec:
  - name: pk
    size: fixed(2) 
    population: uniform(1..5) 
  - name: pk2
    size: fixed(2) 
    population: uniform(1..5) 
    
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup  -rate threads=5 -node 
127.0.0.1
without multipler - 1 row
with multiplier - 25 rows

================================

table: test4
table_definition: |
  CREATE TABLE test4 (
        pk int,
        pk2 text,
        val text,
        PRIMARY KEY ((pk,pk2))
  ) 
columnspec:
  - name: pk
    size: fixed(2) 
    population: uniform(1..500M) 
  - name: pk2
    size: fixed(2) 
    population: uniform(1..5) 
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup  -rate threads=5 -node 
127.0.0.1
without multipler - 1000 row
with multiplier - 1000 rows

===================================
table: test7
table_definition: |
  CREATE TABLE test7 (
        pk int,
        pk2 text,
        ck1 text,
        val text,
        PRIMARY KEY ((pk,pk2), ck1)
  ) 
columnspec:
  - name: pk
    size: fixed(2) 
    population: uniform(1..100) 
  - name: pk2
    size: fixed(4) 
    population: uniform(1..10000) 
  - name: pk2
    size: fixed(4) 
    population: uniform(1..1000) 

user profile=... ops(insert=1) n=100000 cl=ALL no-warmup  -rate threads=5 -node 
127.0.0.1
without multipler - 10342 row
with multiplier - 63387 rows

=====================================
table_definition: |
  CREATE TABLE test7 (
        pk int,
        pk2 text,
        ck1 text,
        val text,
        PRIMARY KEY ((pk,pk2), ck1)
  ) 
columnspec:
  - name: pk
    size: fixed(4) 
    population: seq(1..100) 
  - name: pk2
    size: fixed(10) 
    population: seq(1..10000) 
  - name: pk2
    size: fixed(10) 
    cluster: uniform(1..1000)
    population: seq(1..1000) 

user profile=... ops(insert=1) n=100000 cl=ALL no-warmup  -rate threads=5 -node 
127.0.0.1
without multiplier - 25000 row
with multiplier - 43304 rows



 

> Randomness of stress distributions is not good
> ----------------------------------------------
>
>                 Key: CASSANDRA-12744
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: T Jake Luciani
>            Assignee: Ben Slater
>            Priority: Minor
>              Labels: stress
>             Fix For: 4.0
>
>         Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-12744) Randomness of stress distributions is not good

Reply via email to