[
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028803#comment-16028803
]
Ben Slater edited comment on CASSANDRA-12744 at 5/30/17 7:18 AM:
-----------------------------------------------------------------
After some more digging, I've come to the conclusion that the issue is that the
JDKRandomGenerator creates close random numbers when seeded with close values.
So, when running with a small range of potential seeds (from the population)
you end up with different random doubles which all round to the same long
value.
The attached patch multiplies the generated seed so that max seed values are of
the order of 10^22. I've tested this against a couple of the failed dtests and
pass OK. In addition, I get the following results from a range of YAML files
(without multiplier result is unmodified trunk, with multiplier is with this
patch applied):
Example 1:
table: test5
table_definition: |
CREATE TABLE test5 (
pk int,
val text,
PRIMARY KEY (pk)
)
columnspec:
- name: pk
size: fixed(64)
population: uniform(1..500)
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup -rate threads=5 -node
127.0.0.1
without multiplier - 47 rows
with multiplier - 490 rows
================================
table: test4
table_definition: |
CREATE TABLE test4 (
pk int,
pk2 text,
val text,
PRIMARY KEY ((pk,pk2))
)
columnspec:
- name: pk
size: fixed(2)
population: uniform(1..5)
- name: pk2
size: fixed(2)
population: uniform(1..5)
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup -rate threads=5 -node
127.0.0.1
without multipler - 1 row
with multiplier - 25 rows
================================
table: test4
table_definition: |
CREATE TABLE test4 (
pk int,
pk2 text,
val text,
PRIMARY KEY ((pk,pk2))
)
columnspec:
- name: pk
size: fixed(2)
population: uniform(1..500M)
- name: pk2
size: fixed(2)
population: uniform(1..5)
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup -rate threads=5 -node
127.0.0.1
without multipler - 1000 row
with multiplier - 1000 rows
===================================
table: test7
table_definition: |
CREATE TABLE test7 (
pk int,
pk2 text,
ck1 text,
val text,
PRIMARY KEY ((pk,pk2), ck1)
)
columnspec:
- name: pk
size: fixed(2)
population: uniform(1..100)
- name: pk2
size: fixed(4)
population: uniform(1..10000)
- name: pk2
size: fixed(4)
population: uniform(1..1000)
user profile=... ops(insert=1) n=100000 cl=ALL no-warmup -rate threads=5 -node
127.0.0.1
without multipler - 10342 row
with multiplier - 63387 rows
=====================================
table_definition: |
CREATE TABLE test7 (
pk int,
pk2 text,
ck1 text,
val text,
PRIMARY KEY ((pk,pk2), ck1)
)
columnspec:
- name: pk
size: fixed(4)
population: seq(1..100)
- name: pk2
size: fixed(10)
population: seq(1..10000)
- name: pk2
size: fixed(10)
cluster: uniform(1..1000)
population: seq(1..1000)
user profile=... ops(insert=1) n=100000 cl=ALL no-warmup -rate threads=5 -node
127.0.0.1
without multiplier - 25000 row
with multiplier - 43304 rows
was (Author: slater_ben):
After some more digging, I've come to the conclusion that the issue is that the
JDKRandomGenerator creates close random numbers when seeded with close values.
So, when running with a small range of potential seeds (from the population)
you end up with different random doubles which all round to the same long
value.
The attached patch multiplies the generated seed so that max seed values are of
the order of 10^22. I've tested this against a couple of the failed dtests and
pass OK. In addition, I get the following results from a range of YAML files:
Example 1:
table: test5
table_definition: |
CREATE TABLE test5 (
pk int,
val text,
PRIMARY KEY (pk)
)
columnspec:
- name: pk
size: fixed(64)
population: uniform(1..500)
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup -rate threads=5 -node
127.0.0.1
without multiplier - 47 rows
with multiplier - 490 rows
================================
table: test4
table_definition: |
CREATE TABLE test4 (
pk int,
pk2 text,
val text,
PRIMARY KEY ((pk,pk2))
)
columnspec:
- name: pk
size: fixed(2)
population: uniform(1..5)
- name: pk2
size: fixed(2)
population: uniform(1..5)
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup -rate threads=5 -node
127.0.0.1
without multipler - 1 row
with multiplier - 25 rows
================================
table: test4
table_definition: |
CREATE TABLE test4 (
pk int,
pk2 text,
val text,
PRIMARY KEY ((pk,pk2))
)
columnspec:
- name: pk
size: fixed(2)
population: uniform(1..500M)
- name: pk2
size: fixed(2)
population: uniform(1..5)
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup -rate threads=5 -node
127.0.0.1
without multipler - 1000 row
with multiplier - 1000 rows
===================================
table: test7
table_definition: |
CREATE TABLE test7 (
pk int,
pk2 text,
ck1 text,
val text,
PRIMARY KEY ((pk,pk2), ck1)
)
columnspec:
- name: pk
size: fixed(2)
population: uniform(1..100)
- name: pk2
size: fixed(4)
population: uniform(1..10000)
- name: pk2
size: fixed(4)
population: uniform(1..1000)
user profile=... ops(insert=1) n=100000 cl=ALL no-warmup -rate threads=5 -node
127.0.0.1
without multipler - 10342 row
with multiplier - 63387 rows
=====================================
table_definition: |
CREATE TABLE test7 (
pk int,
pk2 text,
ck1 text,
val text,
PRIMARY KEY ((pk,pk2), ck1)
)
columnspec:
- name: pk
size: fixed(4)
population: seq(1..100)
- name: pk2
size: fixed(10)
population: seq(1..10000)
- name: pk2
size: fixed(10)
cluster: uniform(1..1000)
population: seq(1..1000)
user profile=... ops(insert=1) n=100000 cl=ALL no-warmup -rate threads=5 -node
127.0.0.1
without multiplier - 25000 row
with multiplier - 43304 rows
> Randomness of stress distributions is not good
> ----------------------------------------------
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
> Issue Type: Bug
> Components: Tools
> Reporter: T Jake Luciani
> Assignee: Ben Slater
> Priority: Minor
> Labels: stress
> Fix For: 4.0
>
> Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad. We are using the
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100
> iterations it's only outputting 3. If you bump it to 10k it hits all 3
> values.
> I made a change to just use the default commons math random generator and now
> see all 3 values for n=10
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]