RE: how to force cassandra-stress to actually generate enough data

2016-06-18 Thread Peter Kovgan
Hi,

Tried to use n=.. instead of duration=..., as one of users suggested.
Have more data on disks, indeed.

Succeed to fill up data disks up to 4% and have an error.

Questions:

1)  May be you can hint what could that error mean?

2)   When I see everything during the test, printed 3 times, does it mean I 
have actually 3 stress "engines" running, each reporting its stuff.
Or I see reports from 3 actually connected nodes?


My test was following:

6 nodes in circle
RF=2

cassandra-stress user profile=./test.yaml ops\(insert=100, 
get300spartaworriers=1\) n=1  no-warmup cl=ONE -rate threads=400 
-node node1, node2, node3, node4, node5, node6 -log file=./stress.log


test.yaml:

---
columnspec:
  -
name: SECURITY_ID
population: uniform(1..300)
size: gaussian(10..15)
  -
name: MARKET_SEGMENT_ID
population: uniform(50..100)
size: fixed(10)
  -
name: EMS_INSTANCE_ID
population: fixed(10)
size: fixed(4)
  -
name: PUB_DATE_ONLY
population: fixed(1)
  -
name: LP_DEAL_CODE
population: fixed(300)
size: fixed(4)
  -
name: PUB_TIMESTAMP
cluster: UNIFORM(1..100)
size: fixed(10)
  -
name: PUB_TIME_ONLY
cluster: UNIFORM(1..100)
size: fixed(10)
  -
name: PUB_SEQ
size: fixed(10)
  -
name: PUB_TIME_MICROS
population: UNIFORM(1..100B)
size: fixed(10)
  -
name: PAYLOAD_TYPE
population: uniform(1..5)
size: fixed(10)
  -
name: PAYLOAD_SERIALIZED
population: uniform(1..500)
size: fixed(256)
  -
name: EMS_LOG_TIMESTAMP
population: uniform(10..1)
size: fixed(10)
  -
name: EMS_LOG_TYPE
population: uniform(1..5)
size: fixed(10)
insert:
  batchtype: UNLOGGED
  partitions: fixed(1)
  select: uniform(1..10)/10
keyspace: marketdata_ttl2
keyspace_definition: "CREATE KEYSPACE marketdata_ttl2 with replication = 
{'class':'NetworkTopologyStrategy','NY':2};\n"
queries:
  get300spartaworriers:
cql: "select PAYLOAD_SERIALIZED,PUB_TIME_MICROS from ems_md_esp_var01_ttl2 
where SECURITY_ID = ? and MARKET_SEGMENT_ID=? and EMS_INSTANCE_ID=? and 
PUB_DATE_ONLY=? and LP_DEAL_CODE=? LIMIT 100"
fields: samerow
table: ems_md_esp_var01_ttl2
table_definition: |
create table ems_md_esp_var01_ttl2 (SECURITY_ID bigint,MARKET_SEGMENT_ID  
int , EMS_INSTANCE_ID   int, LP_DEAL_CODE  ascii, PUB_TIME_MICROS   
bigint, PUB_SEQ   text,PUB_TIMESTAMP text , 
PUB_DATE_ONLY date  , PUB_TIME_ONLY text  , 
PAYLOAD_TYPE  int, PAYLOAD_SERIALIZEDblob, EMS_LOG_TIMESTAMP 
timestamp,EMS_LOG_TYPE  int   , primary key  ( ( SECURITY_ID, 
MARKET_SEGMENT_ID, EMS_INSTANCE_ID, PUB_DATE_ONLY, LP_DEAL_CODE ) , 
PUB_TIMESTAMP, PUB_TIME_ONLY))with clustering order by (PUB_TIMESTAMP asc, 
PUB_TIME_ONLY asc) and  default_time_to_live = 2419200


Last lines from the test log:


get300spartaworriers,  22611773, 772,   5,   5,49.3, 3.2,   
285.8,   746.7,  1961.9,  2576.5,23072.2,  0.00344,  0,  0,   0,
   0,   0,   0
get300spartaworriers,  22611773, 772,   5,   5,49.3, 3.2,   
285.8,   746.7,  1961.9,  2576.5,23072.2,  0.00344,  0,  0,   0,
   0,   0,   0
get300spartaworriers,  22611773, 772,   5,   5,49.3, 3.2,   
285.8,   746.7,  1961.9,  2576.5,23072.2,  0.00344,  0,  0,   0,
   0,   0,   0

insert,   2259485753,   78396,   78396,   78396, 4.6, 1.6, 7.5,
51.8,   508.0,   709.5,23072.2,  0.00344,  0,  0,   0,   0, 
  0,   0
insert,   2259485753,   78396,   78396,   78396, 4.6, 1.6, 7.5,
51.8,   508.0,   709.5,23072.2,  0.00344,  0,  0,   0,   0, 
  0,   0
insert,   2259485753,   78396,   78396,   78396, 4.6, 1.6, 7.5,
51.8,   508.0,   709.5,23072.2,  0.00344,  0,  0,   0,   0, 
  0,   0

total,2282097526,   79166,   78399,   78399, 5.0, 1.6, 7.8,
61.7,   577.1,  2576.5,23072.2,  0.00344,  0,  0,   0,   0, 
  0,   0
total,2282097526,   79166,   78399,   78399, 5.0, 1.6, 7.8,
61.7,   577.1,  2576.5,23072.2,  0.00344,  0,  0,   0,   0, 
  0,   0
total,2282097526,   79166,   78399,   78399, 5.0, 1.6, 7.8,
61.7,   577.1,  2576.5,23072.2,  0.00344,  0,  0,   0,   0, 
  0,   0

java.io.IOException: Operation x10 on key(s) 
[38127572172|905017789|1736561174|1639597-07-18|u-?]: Error executing: 
(NoSuchElementException)
java.io.IOException: Operation x10 on key(s) 
[38127572172|905017789|1736561174|1639597-07-18|u-?]: Error executing: 
(NoSuchElementException)

java.io.IOException: Operation x10 on key(s) 
[38127572172|905017789|1736561174|1639597-07-18|u-?]: Error executing: 
(NoSuchElementException)

at 

Re: how to force cassandra-stress to actually generate enough data

2016-06-17 Thread Giampaolo Trapasso
I do not know if it can really help in your situation,
but  from NGCC notes I discovered the existence of GatlingCQL (
https://github.com/gatling-cql/GatlingCql) as an alternative to
cassandra-stress.
In particular you can tweak a bit the data generation part.

giampaolo


2016-06-16 10:33 GMT+02:00 Peter Kovgan :

> Thank you, guys.
>
> I will try all proposals.
>
> The limitation, mentioned by Benedict, is huge.
>
> But anyway, there is something to do around…..
>
>
>
> *From:* Peter Kovgan
> *Sent:* Wednesday, June 15, 2016 3:25 PM
> *To:* 'user@cassandra.apache.org'
> *Subject:* how to force cassandra-stress to actually generate enough data
>
>
>
> Hi,
>
>
>
> The cassandra-stress is not helping really to populate the disk
> sufficiently.
>
>
>
> I tried several table structures, providing
>
> cluster: UNIFORM(1..100)  on clustering parts of the PK.
>
>
>
> Partition part of PK makes about 660 000 partitions.
>
>
>
> The hope was create enough cells in a row, make the row really WIDE.
>
>
>
> No matter what I tried, does no matter how long it runs, I see maximum 2-3
> SSTables per node and maximum 300Mb of data per node.
>
>
>
> (I have 6 nodes and very active 400 threads stress)
>
>
>
> It looks, like It is impossible to make the row really wide and disk
> really full.
>
>
>
> Is it intentional?
>
>
>
> I mean, if there was an intention to avoid really wide rows, why there is
> no hint on this in docs?
>
>
>
> Do you have similar experience and do you know how resolve that?
>
>
>
> Thanks.
>
>
>
>
>
>
>
>
>
>
> 
> This communication and all or some of the information contained therein
> may be confidential and is subject to our Terms and Conditions. If you have
> received this communication in error, please destroy all electronic and
> paper copies and notify the sender immediately. Unless specifically
> indicated, this communication is not a confirmation, an offer to sell or
> solicitation of any offer to buy any financial product, or an official
> statement of ICAP or its affiliates. Non-Transactable Pricing Terms and
> Conditions apply to any non-transactable pricing provided. All terms and
> conditions referenced herein available at www.icapterms.com. Please
> notify us by reply message if this link does not work.
>
> 
>


RE: how to force cassandra-stress to actually generate enough data

2016-06-16 Thread Peter Kovgan
Thank you, guys.
I will try all proposals.
The limitation, mentioned by Benedict, is huge.
But anyway, there is something to do around.

From: Peter Kovgan
Sent: Wednesday, June 15, 2016 3:25 PM
To: 'user@cassandra.apache.org'
Subject: how to force cassandra-stress to actually generate enough data

Hi,

The cassandra-stress is not helping really to populate the disk sufficiently.

I tried several table structures, providing

cluster: UNIFORM(1..100)  on clustering parts of the PK.

Partition part of PK makes about 660 000 partitions.

The hope was create enough cells in a row, make the row really WIDE.

No matter what I tried, does no matter how long it runs, I see maximum 2-3 
SSTables per node and maximum 300Mb of data per node.

(I have 6 nodes and very active 400 threads stress)

It looks, like It is impossible to make the row really wide and disk really 
full.

Is it intentional?

I mean, if there was an intention to avoid really wide rows, why there is no 
hint on this in docs?

Do you have similar experience and do you know how resolve that?

Thanks.

**
This communication and all or some of the information contained therein may be 
confidential and is subject to our Terms and Conditions. If you have received 
this
communication in error, please destroy all electronic and paper copies and 
notify the sender immediately. Unless specifically indicated, this 
communication is 
not a confirmation, an offer to sell or solicitation of any offer to buy any 
financial product, or an official statement of ICAP or its affiliates. 
Non-Transactable Pricing Terms and Conditions apply to any non-transactable 
pricing provided. All terms and conditions referenced herein available
at www.icapterms.com. Please notify us by reply message if this link does not 
work.
**


Re: how to force cassandra-stress to actually generate enough data

2016-06-15 Thread Benedict Elliott Smith
cassandra-stress has some (many) limitations - that I had planned to
address now it's seeing wider adoption, but since I no longer work on the
project for my day job I am unlikely to now... so, sorry but you'll have to
tolerate them :)

In particular, the problem you encounter here is that a given clustering
*tier* must be generated in its entirety before performing any operation
that touches any of its values (read or write), regardless of how many are
actually needed.  So, if you have a single clustering column in your
primary key, the client must generate the entire partition.  And if you
have a million of them, you may just be watching your cassandra-stress
instance enter a GC spiral and die slowly; in all likelihood the data you
see is just the partitions that get randomly assigned a modest size in your
range.

If you need to generate giant partitions, at the moment you need to have
multiple clustering columns, and preferably keep the cardinality of each to
at most a few hundred.  The smaller, the faster queries that only touch
small portions of the partition will run (such as point or range queries,
or partial insertions)

On 15 June 2016 at 13:24, Peter Kovgan 
wrote:

> Hi,
>
>
>
> The cassandra-stress is not helping really to populate the disk
> sufficiently.
>
>
>
> I tried several table structures, providing
>
> cluster: UNIFORM(1..100)  on clustering parts of the PK.
>
>
>
> Partition part of PK makes about 660 000 partitions.
>
>
>
> The hope was create enough cells in a row, make the row really WIDE.
>
>
>
> No matter what I tried, does no matter how long it runs, I see maximum 2-3
> SSTables per node and maximum 300Mb of data per node.
>
>
>
> (I have 6 nodes and very active 400 threads stress)
>
>
>
> It looks, like It is impossible to make the row really wide and disk
> really full.
>
>
>
> Is it intentional?
>
>
>
> I mean, if there was an intention to avoid really wide rows, why there is
> no hint on this in docs?
>
>
>
> Do you have similar experience and do you know how resolve that?
>
>
>
> Thanks.
>
>
>
>
>
>
>
>
>
>
> 
> This communication and all or some of the information contained therein
> may be confidential and is subject to our Terms and Conditions. If you have
> received this communication in error, please destroy all electronic and
> paper copies and notify the sender immediately. Unless specifically
> indicated, this communication is not a confirmation, an offer to sell or
> solicitation of any offer to buy any financial product, or an official
> statement of ICAP or its affiliates. Non-Transactable Pricing Terms and
> Conditions apply to any non-transactable pricing provided. All terms and
> conditions referenced herein available at www.icapterms.com. Please
> notify us by reply message if this link does not work.
>
> 
>


Re: how to force cassandra-stress to actually generate enough data

2016-06-15 Thread Ben Slater
Are you running with n=[number ops] or duration=[xx]? I’ve found you need
to you n= when inserting data. When you use duration cassandra-stress
defaults to 1,000,000 somethings (to be honest, I’m not entirely sure if
it’s rows, partitions or something else that the 1,000,000 relates to) and
running for a long time just results in overwriting a lot a data that gets
compacted away. Using n=[number > 1M] will get you n somethings.

Cheers
Ben

On Wed, 15 Jun 2016 at 22:25 Peter Kovgan 
wrote:

> Hi,
>
>
>
> The cassandra-stress is not helping really to populate the disk
> sufficiently.
>
>
>
> I tried several table structures, providing
>
> cluster: UNIFORM(1..100)  on clustering parts of the PK.
>
>
>
> Partition part of PK makes about 660 000 partitions.
>
>
>
> The hope was create enough cells in a row, make the row really WIDE.
>
>
>
> No matter what I tried, does no matter how long it runs, I see maximum 2-3
> SSTables per node and maximum 300Mb of data per node.
>
>
>
> (I have 6 nodes and very active 400 threads stress)
>
>
>
> It looks, like It is impossible to make the row really wide and disk
> really full.
>
>
>
> Is it intentional?
>
>
>
> I mean, if there was an intention to avoid really wide rows, why there is
> no hint on this in docs?
>
>
>
> Do you have similar experience and do you know how resolve that?
>
>
>
> Thanks.
>
>
>
>
>
>
>
>
>
>
> 
> This communication and all or some of the information contained therein
> may be confidential and is subject to our Terms and Conditions. If you have
> received this communication in error, please destroy all electronic and
> paper copies and notify the sender immediately. Unless specifically
> indicated, this communication is not a confirmation, an offer to sell or
> solicitation of any offer to buy any financial product, or an official
> statement of ICAP or its affiliates. Non-Transactable Pricing Terms and
> Conditions apply to any non-transactable pricing provided. All terms and
> conditions referenced herein available at www.icapterms.com. Please
> notify us by reply message if this link does not work.
>
> 
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: how to force cassandra-stress to actually generate enough data

2016-06-15 Thread Julien Anguenot
I usually do a write only bench run first. Doing a 1B write iterations will 
produce 200GB+ data on disk.  You can then do mixed tests.

For instance, a write bench that would produce such volume on a 3 nodes cluster:

./tools/bin/cassandra-stress write cl=LOCAL_QUORUM n=10  -rate 
threads=1 -node 1.2.3.1,1.2.3.2.1.2.3.4 -schema 
'replication(strategy=NetworkTopologyStrategy,dallas=3)'  -log 
file=raid5_ssd_1b_10kt_cl_quorum.log -graph 
file=raid5_ssd_1B_10kt_cl_quorum.html title=raid5_ssd_1B_10kt_cl_quorum 
revision=benchmark-0

After that you can then do various mixed bench runs with data, SSTables and 
compactions kicking in.

Not sure this is the best, advocated, way to achieve the goal when having empty 
disk and no dataset to start with though.

   J.


> On Jun 15, 2016, at 7:24 AM, Peter Kovgan  
> wrote:
> 
> Hi,
>  
> The cassandra-stress is not helping really to populate the disk sufficiently.
>  
> I tried several table structures, providing 
> 
> cluster: UNIFORM(1..100)  on clustering parts of the PK.
>  
> Partition part of PK makes about 660 000 partitions.
>  
> The hope was create enough cells in a row, make the row really WIDE.
>  
> No matter what I tried, does no matter how long it runs, I see maximum 2-3 
> SSTables per node and maximum 300Mb of data per node.
>  
> (I have 6 nodes and very active 400 threads stress)
>  
> It looks, like It is impossible to make the row really wide and disk really 
> full.
>  
> Is it intentional? 
>  
> I mean, if there was an intention to avoid really wide rows, why there is no 
> hint on this in docs?
>  
> Do you have similar experience and do you know how resolve that?
>  
> Thanks.
>  
>  
>  
>  
> 
> 
> This communication and all or some of the information contained therein may 
> be confidential and is subject to our Terms and Conditions. If you have 
> received this communication in error, please destroy all electronic and paper 
> copies and notify the sender immediately. Unless specifically indicated, this 
> communication is not a confirmation, an offer to sell or solicitation of any 
> offer to buy any financial product, or an official statement of ICAP or its 
> affiliates. Non-Transactable Pricing Terms and Conditions apply to any 
> non-transactable pricing provided. All terms and conditions referenced herein 
> available at www.icapterms.com . Please notify us 
> by reply message if this link does not work.
> 

--
Julien Anguenot (@anguenot)