[jira] [Updated] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

Ralf Steppacher (JIRA) Mon, 08 Feb 2016 23:47:39 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ralf Steppacher updated CASSANDRA-11138:
----------------------------------------
    Description: 
I am trying to get the stress tool to generate random values for three 
clustering keys. I am trying to simulate collecting events per user id (text, 
partition key). Events have a session type (text), event type (text), and 
creation time (timestamp) (clustering keys, in that order). For testing 
purposes I ended up with the following column spec:

{noformat}
columnspec:
- name: created_at
  cluster: uniform(10..10)
- name: event_type
  size: uniform(5..10)
  population: uniform(1..30)
  cluster: uniform(1..30)
- name: session_type
  size: fixed(5)
  population: uniform(1..4)
  cluster: uniform(1..4)
- name: user_id
  size: fixed(15)
  population: uniform(1..1000000)
- name: message
  size: uniform(10..100)
  population: uniform(1..100B)
{noformat}

My expectation was that this would lead to anywhere between 10 and 1200 rows to 
be created per partition key. But it seems that exactly 10 rows are being 
created, with the {{created_at}} timestamp being the only variable that is 
assigned variable values (per partition key). The {{session_type}} and 
{{event_type}} variables are assigned fixed values. This is even the case if I 
set the cluster distribution to uniform(30..30) and uniform(4..4) respectively. 
With this setting I expected 1200 rows per partition key to be created, as 
announced when running the stress tool, but it is still 10.

{noformat}
[rsteppac@centos bin]$ ./cassandra-stress user profile=../batch_too_large.yaml 
ops\(insert=1\) -log level=verbose 
file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
10.211.55.8
…
Created schema. Sleeping 1s for propagation.
Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
total rows in the partitions)
Improvement over 4 threadCount: 19%
...
{noformat}

Sample of generated data:

{noformat}
cqlsh> select user_id, event_type, session_type, created_at from 
stresscql.batch_too_large LIMIT 30 ;

user_id                     | event_type       | session_type | created_at
-----------------------------+------------------+--------------+--------------------------
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2012-10-19 
08:14:11+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2004-11-08 
04:04:56+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2002-10-15 
00:39:23+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1999-08-31 
19:56:30+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1999-04-02 
20:46:26+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1990-10-08 
03:27:17+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1984-03-31 
23:30:34+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1975-11-16 
02:41:28+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1970-04-07 
07:23:48+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1970-03-08 
23:23:04+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2015-10-12 
17:48:51+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2010-10-28 
06:21:13+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2005-06-28 
03:34:41+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2005-01-29 
05:26:21+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2003-03-27 
01:31:24+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2002-03-29 
14:22:43+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2000-06-15 
14:54:29+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1998-03-08 
13:31:54+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1988-01-21 
06:38:40+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1975-08-03 
21:16:47+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2014-11-23 
17:05:45+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2012-02-23 
23:20:54+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2012-02-19 
12:05:15+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2005-10-17 
04:22:45+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2003-02-24 
19:45:06+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1996-12-18 
06:18:31+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1991-06-10 
22:07:45+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1983-05-05 
12:29:09+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1972-04-17 
21:24:52+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1971-05-09 
23:00:02+0000

(30 rows)
cqlsh>
{noformat}

If I remove the {{created_at}} clustering key, then the other two clustering 
keys are being assigned variable values per partition key.

  was:
I am trying to get the stress tool to generate random values for three 
clustering keys. I am trying to simulate collecting events per user id (text, 
partition key). Events have a session type (text), event type (text), and 
creation time (timestamp) (clustering keys, in that order). For testing 
purposes I ended up with the following column spec:

{noformat}
columnspec:
- name: created_at
  cluster: uniform(10..10)
- name: event_type
  size: uniform(5..10)
  population: uniform(1..30)
  cluster: uniform(1..30)
- name: session_type
  size: fixed(5)
  population: uniform(1..4)
  cluster: uniform(1..4)
- name: user_id
  size: fixed(15)
  population: uniform(1..1000000)
- name: message
  size: uniform(10..100)
  population: uniform(1..100B)
{noformat}

My expectation was that this would lead to anywhere between 10 and 1200 rows to 
be created per partition key. But it seems that exactly 10 rows are being 
created, with the {{created_at}} timestamp being the only variable that is 
assigned variable values (per partition key). The {{session_type}} and 
{{event_type}} variables are assigned fixed values. This is even the case if I 
set the cluster distribution to uniform(30..30) and uniform(4..4) respectively. 
With this setting I expected 1200 rows per partition key to be created, as 
announced when running the stress tool, but it is still 10.

{noformat}
[rsteppac@centos bin]$ ./cassandra-stress user profile=../batch_too_large.yaml 
ops\(insert=1\) -log level=verbose 
file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
10.211.55.8
…
Created schema. Sleeping 1s for propagation.
Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
total rows in the partitions)
Improvement over 4 threadCount: 19%
...
{noformat}

Sample of generated data:

{noformat}
cqlsh> select user_id, event_type, session_type, created_at from 
stresscql.batch_too_large LIMIT 30 ;

user_id                     | event_type       | session_type | created_at
-----------------------------+------------------+--------------+--------------------------
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2012-10-19 
08:14:11+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2004-11-08 
04:04:56+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2002-10-15 
00:39:23+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1999-08-31 
19:56:30+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1999-04-02 
20:46:26+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1990-10-08 
03:27:17+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1984-03-31 
23:30:34+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1975-11-16 
02:41:28+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1970-04-07 
07:23:48+0000
  %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1970-03-08 
23:23:04+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2015-10-12 
17:48:51+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2010-10-28 
06:21:13+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2005-06-28 
03:34:41+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2005-01-29 
05:26:21+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2003-03-27 
01:31:24+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2002-03-29 
14:22:43+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2000-06-15 
14:54:29+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1998-03-08 
13:31:54+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1988-01-21 
06:38:40+0000
     N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1975-08-03 
21:16:47+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2014-11-23 
17:05:45+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2012-02-23 
23:20:54+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2012-02-19 
12:05:15+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2005-10-17 
04:22:45+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2003-02-24 
19:45:06+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1996-12-18 
06:18:31+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1991-06-10 
22:07:45+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1983-05-05 
12:29:09+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1972-04-17 
21:24:52+0000
oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1971-05-09 
23:00:02+0000

(30 rows)
cqlsh>
{noformat}

If I remove the created_at clustering keys then the other two clustering keys 
are assigned variable values per partition key.


> cassandra-stress tool - clustering key values not distributed
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-11138
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>         Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>            Reporter: Ralf Steppacher
>
> I am trying to get the stress tool to generate random values for three 
> clustering keys. I am trying to simulate collecting events per user id (text, 
> partition key). Events have a session type (text), event type (text), and 
> creation time (timestamp) (clustering keys, in that order). For testing 
> purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..1000000)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows 
> to be created per partition key. But it seems that exactly 10 rows are being 
> created, with the {{created_at}} timestamp being the only variable that is 
> assigned variable values (per partition key). The {{session_type}} and 
> {{event_type}} variables are assigned fixed values. This is even the case if 
> I set the cluster distribution to uniform(30..30) and uniform(4..4) 
> respectively. With this setting I expected 1200 rows per partition key to be 
> created, as announced when running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from 
> stresscql.batch_too_large LIMIT 30 ;
> user_id                     | event_type       | session_type | created_at
> -----------------------------+------------------+--------------+--------------------------
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2012-10-19 
> 08:14:11+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2004-11-08 
> 04:04:56+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2002-10-15 
> 00:39:23+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1999-08-31 
> 19:56:30+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1999-04-02 
> 20:46:26+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1990-10-08 
> 03:27:17+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1984-03-31 
> 23:30:34+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1975-11-16 
> 02:41:28+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1970-04-07 
> 07:23:48+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1970-03-08 
> 23:23:04+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2015-10-12 
> 17:48:51+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2010-10-28 
> 06:21:13+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2005-06-28 
> 03:34:41+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2005-01-29 
> 05:26:21+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2003-03-27 
> 01:31:24+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2002-03-29 
> 14:22:43+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2000-06-15 
> 14:54:29+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1998-03-08 
> 13:31:54+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1988-01-21 
> 06:38:40+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1975-08-03 
> 21:16:47+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2014-11-23 
> 17:05:45+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2012-02-23 
> 23:20:54+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2012-02-19 
> 12:05:15+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2005-10-17 
> 04:22:45+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2003-02-24 
> 19:45:06+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1996-12-18 
> 06:18:31+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1991-06-10 
> 22:07:45+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1983-05-05 
> 12:29:09+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1972-04-17 
> 21:24:52+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1971-05-09 
> 23:00:02+0000
> (30 rows)
> cqlsh>
> {noformat}
> If I remove the {{created_at}} clustering key, then the other two clustering 
> keys are being assigned variable values per partition key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

Reply via email to