Re: Composite partition key token

2017-02-09 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Works great, thank you!

On 2/9/17, 6:26 AM, "Michael Burman" <mibur...@redhat.com> wrote:

Hi,

How about taking it from the BoundStatement directly?

ByteBuffer routingKey = 
b.getRoutingKey(ProtocolVersion.NEWEST_SUPPORTED, codecRegistry);
Token token = metadata.newToken(routingKey);

In this case the b is the "BoundStatement". Replace codecRegistry & 
ProtocolVersion with what you have. codecRegistry for example from the 
codecRegistry = session.getCluster().getConfiguration().getCodecRegistry();

   - Micke

    
On 02/08/2017 08:58 PM, Branislav Janosik -T (bjanosik - AAP3 INC at 
Cisco) wrote:
>
> Hi,
>
> I would like to ask how to calculate token for composite partition key 
> using java api?
>
> For partition key made of one column I use 
> cluster.getMetadata().newToken(newBuffer);
>
> But what if my key looks like this PRIMARY KEY 
> ((parentResourceId,timeRT), childName)?
>
> I read that “:” is a separator but it doesn’t seem to be the case.
>
> How can I create ByteBuffer with multiple values so that the token 
> would be actually correct?
>
> Thank you,
>
> Branislav
>





Re: Cluster scaling

2017-02-08 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi Jan,

Yes, you are right about the batches, I am working on a correction of the way 
we use batches, just like you mentioned. I monitored all those stats and seems 
that hardware is not he bottleneck.
Thank you for the response and advise!

Cheers,
Branislav

From: Jan Kesten <j.kes...@enercast.de>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, February 8, 2017 at 8:20 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Cc: "j.kes...@enercast.de" <j.kes...@enercast.de>
Subject: Re: Cluster scaling


Hi Branislav,

what is it you would expect?

Some thoughts:

Batches are often misunderstood, they work well only if they contain only one 
partition key - think of a batch of different sensor data to one key. If you 
group batches with many partition keys and/or do large batches this puts high 
load on the coordinator node with then itself needs to talk to the nodes 
holding the partitions. This could explain the scaling you see in your second 
try without batches. Keep in mind that the driver supports executeAsync and 
ResultSetFutures.

Second, put commitlog and data directories on seperate disks when using 
spindles.

Third, have you monitored iostats and cpustats while running your tests?

Cheers,

Jan
Am 08.02.2017 um 16:39 schrieb Branislav Janosik -T (bjanosik - AAP3 INC at 
Cisco):

Hi all,



I have a cluster of three nodes and would like to ask some questions about the 
performance.

I wrote a small benchmarking tool in java that mirrors (read, write) operations 
that we do in the real project.

Problem is that it is not scaling like it should. The program runs two tests: 
one using batch statement and one without using the batch.

The operation sequence is: optional select, insert, update, insert. I run the 
tool on my server with 128 threads (# of threads has no influence on the 
performance),

creating usually 100K resources for testing purposes.



The average results (operations per second) with the use of batch statement are:



Replication Factor = 1   with readingwithout reading

1-node cluster 37K 46K

2-node cluster 37K 47K

3-node cluster 39K 70K



Replication Factor = 2   with readingwithout reading

2-node cluster 21K 40K

3-node cluster 30K 48K



The average results (operations per second) without the use of batch statement 
are:



Replication Factor = 1   with readingwithout reading

1-node cluster 31K 20K

2-node cluster 38K 39K

3-node cluster 45K 87K



Replication Factor = 2   with readingwithout reading

2-node cluster 19K 22K

3-node cluster 26K 36K



The Cassandra VMs specs are: 16 CPUs,  16GB and two 32GB of RAM, at least 30GB 
of disk space for each node. Non SSD, each VM is on separate physical server.



The code is available here https://github.com/bjanosik/CassandraBenchTool.git . 
It can be built with Maven and then you can use jar in target directory with 
java -jar target/cassandra-test-1.0-SNAPSHOT-jar-with-dependencies.jar .

Thank you for any help.




--

Jan Kesten, mailto:j.kes...@enercast.de

Tel.: +49 561/4739664-0 FAX: -9 Mobil: +49 160 / 90 98 41 68

enercast GmbH Universitätsplatz 12 D-34127 Kassel HRB15471

http://www.enercast.de Online-Prognosen für erneuerbare Energien

Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), Philipp Rinder (CSO)



Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese 
E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail 
oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank.



This e-mail and any attachment may contain confidential and/or privileged 
information. If you are not the named addressee or if this transmission has 
been addressed to you in error, please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. Please 
understand that you must not copy this e-mail or any attachment or disclose the 
contents to any other person. Thank you for your cooperation.


Composite partition key token

2017-02-08 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi,

I would like to ask how to calculate token for composite partition key using 
java api?
For partition key made of one column I use 
cluster.getMetadata().newToken(newBuffer);
But what if my key looks like this PRIMARY KEY ((parentResourceId,timeRT), 
childName)?
I read that “:” is a separator but it doesn’t seem to be the case.
How can I create ByteBuffer with multiple values so that the token would be 
actually correct?

Thank you,
Branislav


Cluster scaling

2017-02-08 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi all,



I have a cluster of three nodes and would like to ask some questions about the 
performance.

I wrote a small benchmarking tool in java that mirrors (read, write) operations 
that we do in the real project.

Problem is that it is not scaling like it should. The program runs two tests: 
one using batch statement and one without using the batch.

The operation sequence is: optional select, insert, update, insert. I run the 
tool on my server with 128 threads (# of threads has no influence on the 
performance),

creating usually 100K resources for testing purposes.



The average results (operations per second) with the use of batch statement are:



Replication Factor = 1   with readingwithout reading

1-node cluster 37K 46K

2-node cluster 37K 47K

3-node cluster 39K 70K



Replication Factor = 2   with readingwithout reading

2-node cluster 21K 40K

3-node cluster 30K 48K



The average results (operations per second) without the use of batch statement 
are:



Replication Factor = 1   with readingwithout reading

1-node cluster 31K 20K

2-node cluster 38K 39K

3-node cluster 45K 87K



Replication Factor = 2   with readingwithout reading

2-node cluster 19K 22K

3-node cluster 26K 36K



The Cassandra VMs specs are: 16 CPUs,  16GB and two 32GB of RAM, at least 30GB 
of disk space for each node. Non SSD, each VM is on separate physical server.



The code is available here https://github.com/bjanosik/CassandraBenchTool.git . 
It can be built with Maven and then you can use jar in target directory with 
java -jar target/cassandra-test-1.0-SNAPSHOT-jar-with-dependencies.jar .

Thank you for any help.



Cluster scaling

2017-02-06 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi all,



I have a cluster of three nodes and would like to ask some questions about the 
performance.

I wrote a small benchmarking tool in java that mirrors (read, write) operations 
that we do in the real project.

Problem is that it is not scaling like it should. The program runs two tests: 
one using batch statement and one without using the batch.

The operation sequence is: optional select, insert, update, insert. I run the 
tool on my server with 128 threads (# of threads has no influence on the 
performance),

creating usually 100K resources for testing purposes.



The average results (operations per second) with the use of batch statement are:



Replication Factor = 1   with readingwithout reading

1-node cluster 37K 46K

2-node cluster 37K 47K

3-node cluster 39K 70K



Replication Factor = 2   with readingwithout reading

2-node cluster 21K 40K

3-node cluster 30K 48K



The average results (operations per second) without the use of batch statement 
are:



Replication Factor = 1   with readingwithout reading

1-node cluster 31K 20K

2-node cluster 38K 39K

3-node cluster 45K 87K



Replication Factor = 2   with readingwithout reading

2-node cluster 19K 22K

3-node cluster 26K 36K



The Cassandra VMs specs are: 16 CPUs,  16GB and two 32GB of RAM, at least 30GB 
of disk space for each node. Non SSD, each VM is on separate physical server.



The tool is attached if someone would like to try it themselves. It can be 
built with Maven and then you can use jar in target directory.

Thank you for any help.



cassandrabench.tar.gz
Description: cassandrabench.tar.gz


Re: Cassandra cluster performance

2017-01-09 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi we have made some changes to our code and benchmarking and now it seems to 
have the scalability. Async writes plus the changes made the difference. So for 
now, thank you very much everyone for help. Very appreciated.

Branislav


From: Jonathan Haddad <j...@jonhaddad.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Sunday, January 8, 2017 at 8:01 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Cc: Abhishek Kumar Maheshwari <abhishek.maheshw...@timesinternet.in>
Subject: Re: Cassandra cluster performance

Can you share your benchmarking code?
On Sun, Jan 8, 2017 at 5:51 PM Branislav Janosik -T (bjanosik - AAP3 INC at 
Cisco) <bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote:

Our test data is just couple of short Strings, load on nodes is just 382 KiB 
and 408 KiB.

I read some articles about async writes and switched from execute to 
execureAsync for the writes. The results seem to be the same (not good), is 
there more that should be done, when doing async writes?


From: Kant Kodali <k...@peernova.com<mailto:k...@peernova.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, January 6, 2017 at 6:05 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Cc: Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in<mailto:abhishek.maheshw...@timesinternet.in>>

Subject: Re: Cassandra cluster performance

yeah you should async writes also you cannot neglect data size so you might 
want to let us know what your data size is?



On Thu, Jan 5, 2017 at 2:57 PM, kurt Greaves 
<k...@instaclustr.com<mailto:k...@instaclustr.com>> wrote:
you should try switching to async writes and then perform the test. sync writes 
won't make much difference from a single node but multiple nodes there should 
be a massive difference.

On 4 Jan 2017 10:05, "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" 
<bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote:
Hi,

Our column family definition is

"CREATE TABLE onem2m.cse(" +
"name TEXT PRIMARY KEY," +
"resourceId TEXT," +
")";
"CREATE TABLE IF NOT EXISTS onem2m.AeIdToResourceIdMapping(" +
"cseBaseCseId TEXT," +
"aeId TEXT," +
"resourceId TEXT," +
"PRIMARY KEY ((cseBaseCseId), aeId)" +
")";

"CREATE TABLE IF NOT EXISTS onem2m.Resources_" + i + "(" +
"CONTENT_INSTANCE_OldestId TEXT," +
"CONTENT_INSTANCE_LatestId TEXT," +
"SUBSCRIPTION_OldestId TEXT," +
"SUBSCRIPTION_LatestId TEXT," +
"resourceId TEXT PRIMARY KEY," +
"resourceType TEXT," +
"resourceName TEXT," +
"jsonContent TEXT," +
"parentId TEXT," +
")";

"CREATE TABLE IF NOT EXISTS onem2m.Children_" + i + "(" +
    "parentResourceId TEXT," +
"childName TEXT," +
"childResourceId TEXT," +
"nextId TEXT," +
"prevId TEXT," +
"PRIMARY KEY ((parentResourceId), childName)" +
")";



From: Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in<mailto:abhishek.maheshw...@timesinternet.in>>
Date: Sunday, December 25, 2016 at 8:54 PM
To: "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" 
<bjano...@cisco.com<mailto:bjano...@cisco.com>>
Cc: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: Cassandra cluster performance

Hi Branislav,


What is your column family definition?


Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591<tel:+91%208%2005591> (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) 
[mailto:bjano...@cisco.com<mailto:bjano...@cisco.com>]
Sent: Thursday, December 22, 2016 6:18 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Cassandra cluster performance

Hi,

- Consistency level is set to ONE
-  Keyspace definition:

"CREATE KEYSPACE  IF NOT EXISTS  onem2m " +
"WITH replication = " +
"{ 'class' : '

Re: Cassandra cluster performance

2017-01-08 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Our test data is just couple of short Strings, load on nodes is just 382 KiB 
and 408 KiB.

I read some articles about async writes and switched from execute to 
execureAsync for the writes. The results seem to be the same (not good), is 
there more that should be done, when doing async writes?


From: Kant Kodali <k...@peernova.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Friday, January 6, 2017 at 6:05 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Cc: Abhishek Kumar Maheshwari <abhishek.maheshw...@timesinternet.in>
Subject: Re: Cassandra cluster performance

yeah you should async writes also you cannot neglect data size so you might 
want to let us know what your data size is?



On Thu, Jan 5, 2017 at 2:57 PM, kurt Greaves 
<k...@instaclustr.com<mailto:k...@instaclustr.com>> wrote:
you should try switching to async writes and then perform the test. sync writes 
won't make much difference from a single node but multiple nodes there should 
be a massive difference.

On 4 Jan 2017 10:05, "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" 
<bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote:
Hi,

Our column family definition is

"CREATE TABLE onem2m.cse(" +
"name TEXT PRIMARY KEY," +
"resourceId TEXT," +
")";
"CREATE TABLE IF NOT EXISTS onem2m.AeIdToResourceIdMapping(" +
"cseBaseCseId TEXT," +
"aeId TEXT," +
"resourceId TEXT," +
"PRIMARY KEY ((cseBaseCseId), aeId)" +
")";

"CREATE TABLE IF NOT EXISTS onem2m.Resources_" + i + "(" +
"CONTENT_INSTANCE_OldestId TEXT," +
"CONTENT_INSTANCE_LatestId TEXT," +
"SUBSCRIPTION_OldestId TEXT," +
"SUBSCRIPTION_LatestId TEXT," +
"resourceId TEXT PRIMARY KEY," +
"resourceType TEXT," +
"resourceName TEXT," +
"jsonContent TEXT," +
"parentId TEXT," +
")";

"CREATE TABLE IF NOT EXISTS onem2m.Children_" + i + "(" +
"parentResourceId TEXT," +
"childName TEXT," +
        "childResourceId TEXT," +
"nextId TEXT," +
"prevId TEXT," +
"PRIMARY KEY ((parentResourceId), childName)" +
")";



From: Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in<mailto:abhishek.maheshw...@timesinternet.in>>
Date: Sunday, December 25, 2016 at 8:54 PM
To: "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" 
<bjano...@cisco.com<mailto:bjano...@cisco.com>>
Cc: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: Cassandra cluster performance

Hi Branislav,


What is your column family definition?


Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591<tel:+91%208%2005591> (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) 
[mailto:bjano...@cisco.com<mailto:bjano...@cisco.com>]
Sent: Thursday, December 22, 2016 6:18 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Cassandra cluster performance

Hi,

- Consistency level is set to ONE
-  Keyspace definition:

"CREATE KEYSPACE  IF NOT EXISTS  onem2m " +
"WITH replication = " +
"{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}";



- yes, the client is on separate VM

- In our project we use Cassandra API version 3.0.2 but the database (cluster) 
is version 3.9

- for 2node cluster:

 first VM: 25 GB RAM, 16 CPUs

 second VM: 16 GB RAM, 16 CPUs




From: Ben Slater <ben.sla...@instaclustr.com<mailto:ben.sla...@instaclustr.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, December 21, 2016 at 2:32 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Cassandra cluster performance

You would expect some drop when moving to single multiple nodes but on the face 
of it that feels extreme to me (although I’ve never personally tested the 
difference). Some questions that might help provide an answer:
- what consistency level are you using for t

Re: Cassandra cluster performance

2017-01-03 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi,

Our column family definition is

"CREATE TABLE onem2m.cse(" +
"name TEXT PRIMARY KEY," +
"resourceId TEXT," +
")";

"CREATE TABLE IF NOT EXISTS onem2m.AeIdToResourceIdMapping(" +
"cseBaseCseId TEXT," +
"aeId TEXT," +
"resourceId TEXT," +
"PRIMARY KEY ((cseBaseCseId), aeId)" +
")";

"CREATE TABLE IF NOT EXISTS onem2m.Resources_" + i + "(" +
"CONTENT_INSTANCE_OldestId TEXT," +
"CONTENT_INSTANCE_LatestId TEXT," +
"SUBSCRIPTION_OldestId TEXT," +
"SUBSCRIPTION_LatestId TEXT," +
"resourceId TEXT PRIMARY KEY," +
"resourceType TEXT," +
"resourceName TEXT," +
"jsonContent TEXT," +
"parentId TEXT," +
")";

"CREATE TABLE IF NOT EXISTS onem2m.Children_" + i + "(" +
"parentResourceId TEXT," +
"childName TEXT," +
"childResourceId TEXT," +
    "nextId TEXT," +
"prevId TEXT," +
"PRIMARY KEY ((parentResourceId), childName)" +
")";



From: Abhishek Kumar Maheshwari <abhishek.maheshw...@timesinternet.in>
Date: Sunday, December 25, 2016 at 8:54 PM
To: "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" <bjano...@cisco.com>
Cc: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: RE: Cassandra cluster performance

Hi Branislav,


What is your column family definition?


Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) 
[mailto:bjano...@cisco.com]
Sent: Thursday, December 22, 2016 6:18 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra cluster performance

Hi,

- Consistency level is set to ONE
-  Keyspace definition:

"CREATE KEYSPACE  IF NOT EXISTS  onem2m " +
"WITH replication = " +
"{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}";



- yes, the client is on separate VM

- In our project we use Cassandra API version 3.0.2 but the database (cluster) 
is version 3.9

- for 2node cluster:

 first VM: 25 GB RAM, 16 CPUs

 second VM: 16 GB RAM, 16 CPUs




From: Ben Slater <ben.sla...@instaclustr.com<mailto:ben.sla...@instaclustr.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, December 21, 2016 at 2:32 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Cassandra cluster performance

You would expect some drop when moving to single multiple nodes but on the face 
of it that feels extreme to me (although I’ve never personally tested the 
difference). Some questions that might help provide an answer:
- what consistency level are you using for the test?
- what is your keyspace definition (replication factor most importantly)?
- where are you running your test client (is it a separate box to cassandra)?
- what C* version?
- what are specs (CPU, RAM) of the test servers?

Cheers
Ben

On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at 
Cisco) <bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote:
Hi all,

I’m working on a project and we have Java benchmark test for testing the 
performance when using Cassandra database. Create operation on a single node 
Cassandra cluster is about 15K operations per second. Problem we have is when I 
set up cluster with 2 or more nodes (each of them are on separate virtual 
machines and servers), the performance goes down to 1K ops/sec. I follow the 
official instructions on how to set up a multinode cluster – the only things I 
change in Cassandra.yaml file are: change seeds to IP address of one node, 
change listen and rpc address to IP address of the node and finally change 
endpoint snitch to GossipingPropertyFileSnitch. The replication factor is set 
to 1 when having 2-node cluster. I use only one datacenter. The cluster seems 
to be doing fine (I can see nodes communicating) and so is the CPU, RAM usage 
on the machines.

Does anybody have any ideas? Any help would be very appreciated.

Thanks!

A must visit exhibition for all Fitness and Sports Freaks. TOI Global Sports 
Business Show from 21 to 23 December 2016 Bombay Exhibition Centre, Mumbai. 
Meet the legends Kaizzad Capadia, Bhaichung Bhutia and more. Join the workshops 
on Boxing & Football and more. www.TOI-GSBS.com


Re: Cassandra cluster performance

2017-01-03 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi,

No we are not using async writes.

From: kurt Greaves 
Reply-To: "user@cassandra.apache.org" 
Date: Friday, December 23, 2016 at 12:17 AM
To: "user@cassandra.apache.org" 
Subject: Re: Cassandra cluster performance

Branislav, are you doing async writes?


Re: Cassandra cluster performance

2016-12-22 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Yes, there is definitely something wrong but I’m struggling to figure out what 
exactly. To answer your questions.

-  There are no errors in client or Cassandra

-  I tried manual inserts and there are no errors either, I set the 
tracing on so I can see that the data is distributed to different partitions. 
Even when I use nodetool status, the Owns is 48.4% to 51.6%.
Regards,
Branislav

From: Ben Slater <ben.sla...@instaclustr.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, December 21, 2016 at 6:20 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Cassandra cluster performance

Given you’re using replication factor 1 (so each piece of data is only going to 
get written to one node) something definitely seems wrong. Some questions/ideas:
- are there any errors in the Cassandra logs or are you seeing any errors at 
the client?
- is your test data distributed across your partition key or is it possible all 
your test data is going to a single partition?
- have you tried manually running a few inserts to see if you get any errors?

Cheers
Ben


On Thu, 22 Dec 2016 at 11:48 Branislav Janosik -T (bjanosik - AAP3 INC at 
Cisco) <bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote:
Hi,

- Consistency level is set to ONE
-  Keyspace definition:

"CREATE KEYSPACE  IF NOT EXISTS  onem2m " +
"WITH replication = " +
"{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}";



- yes, the client is on separate VM

- In our project we use Cassandra API version 3.0.2 but the database (cluster) 
is version 3.9

- for 2node cluster:

 first VM: 25 GB RAM, 16 CPUs

 second VM: 16 GB RAM, 16 CPUs




From: Ben Slater <ben.sla...@instaclustr.com<mailto:ben.sla...@instaclustr.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, December 21, 2016 at 2:32 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Cassandra cluster performance

You would expect some drop when moving to single multiple nodes but on the face 
of it that feels extreme to me (although I’ve never personally tested the 
difference). Some questions that might help provide an answer:
- what consistency level are you using for the test?
- what is your keyspace definition (replication factor most importantly)?
- where are you running your test client (is it a separate box to cassandra)?
- what C* version?
- what are specs (CPU, RAM) of the test servers?

Cheers
Ben

On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at 
Cisco) <bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote:
Hi all,

I’m working on a project and we have Java benchmark test for testing the 
performance when using Cassandra database. Create operation on a single node 
Cassandra cluster is about 15K operations per second. Problem we have is when I 
set up cluster with 2 or more nodes (each of them are on separate virtual 
machines and servers), the performance goes down to 1K ops/sec. I follow the 
official instructions on how to set up a multinode cluster – the only things I 
change in Cassandra.yaml file are: change seeds to IP address of one node, 
change listen and rpc address to IP address of the node and finally change 
endpoint snitch to GossipingPropertyFileSnitch. The replication factor is set 
to 1 when having 2-node cluster. I use only one datacenter. The cluster seems 
to be doing fine (I can see nodes communicating) and so is the CPU, RAM usage 
on the machines.

Does anybody have any ideas? Any help would be very appreciated.

Thanks!



Re: Cassandra cluster performance

2016-12-21 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi,

- Consistency level is set to ONE
-  Keyspace definition:

"CREATE KEYSPACE  IF NOT EXISTS  onem2m " +
"WITH replication = " +
"{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}";



- yes, the client is on separate VM

- In our project we use Cassandra API version 3.0.2 but the database (cluster) 
is version 3.9

- for 2node cluster:

 first VM: 25 GB RAM, 16 CPUs

 second VM: 16 GB RAM, 16 CPUs




From: Ben Slater <ben.sla...@instaclustr.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, December 21, 2016 at 2:32 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Cassandra cluster performance

You would expect some drop when moving to single multiple nodes but on the face 
of it that feels extreme to me (although I’ve never personally tested the 
difference). Some questions that might help provide an answer:
- what consistency level are you using for the test?
- what is your keyspace definition (replication factor most importantly)?
- where are you running your test client (is it a separate box to cassandra)?
- what C* version?
- what are specs (CPU, RAM) of the test servers?

Cheers
Ben

On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at 
Cisco) <bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote:
Hi all,

I’m working on a project and we have Java benchmark test for testing the 
performance when using Cassandra database. Create operation on a single node 
Cassandra cluster is about 15K operations per second. Problem we have is when I 
set up cluster with 2 or more nodes (each of them are on separate virtual 
machines and servers), the performance goes down to 1K ops/sec. I follow the 
official instructions on how to set up a multinode cluster – the only things I 
change in Cassandra.yaml file are: change seeds to IP address of one node, 
change listen and rpc address to IP address of the node and finally change 
endpoint snitch to GossipingPropertyFileSnitch. The replication factor is set 
to 1 when having 2-node cluster. I use only one datacenter. The cluster seems 
to be doing fine (I can see nodes communicating) and so is the CPU, RAM usage 
on the machines.

Does anybody have any ideas? Any help would be very appreciated.

Thanks!



Cassandra cluster performance

2016-12-21 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi all,

I’m working on a project and we have Java benchmark test for testing the 
performance when using Cassandra database. Create operation on a single node 
Cassandra cluster is about 15K operations per second. Problem we have is when I 
set up cluster with 2 or more nodes (each of them are on separate virtual 
machines and servers), the performance goes down to 1K ops/sec. I follow the 
official instructions on how to set up a multinode cluster – the only things I 
change in Cassandra.yaml file are: change seeds to IP address of one node, 
change listen and rpc address to IP address of the node and finally change 
endpoint snitch to GossipingPropertyFileSnitch. The replication factor is set 
to 1 when having 2-node cluster. I use only one datacenter. The cluster seems 
to be doing fine (I can see nodes communicating) and so is the CPU, RAM usage 
on the machines.

Does anybody have any ideas? Any help would be very appreciated.

Thanks!