Re: Cassandra cluster performance
Hi we have made some changes to our code and benchmarking and now it seems to have the scalability. Async writes plus the changes made the difference. So for now, thank you very much everyone for help. Very appreciated. Branislav From: Jonathan Haddad <j...@jonhaddad.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Sunday, January 8, 2017 at 8:01 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Cc: Abhishek Kumar Maheshwari <abhishek.maheshw...@timesinternet.in> Subject: Re: Cassandra cluster performance Can you share your benchmarking code? On Sun, Jan 8, 2017 at 5:51 PM Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) <bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote: Our test data is just couple of short Strings, load on nodes is just 382 KiB and 408 KiB. I read some articles about async writes and switched from execute to execureAsync for the writes. The results seem to be the same (not good), is there more that should be done, when doing async writes? From: Kant Kodali <k...@peernova.com<mailto:k...@peernova.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Friday, January 6, 2017 at 6:05 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Cc: Abhishek Kumar Maheshwari <abhishek.maheshw...@timesinternet.in<mailto:abhishek.maheshw...@timesinternet.in>> Subject: Re: Cassandra cluster performance yeah you should async writes also you cannot neglect data size so you might want to let us know what your data size is? On Thu, Jan 5, 2017 at 2:57 PM, kurt Greaves <k...@instaclustr.com<mailto:k...@instaclustr.com>> wrote: you should try switching to async writes and then perform the test. sync writes won't make much difference from a single node but multiple nodes there should be a massive difference. On 4 Jan 2017 10:05, "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" <bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote: Hi, Our column family definition is "CREATE TABLE onem2m.cse(" + "name TEXT PRIMARY KEY," + "resourceId TEXT," + ")"; "CREATE TABLE IF NOT EXISTS onem2m.AeIdToResourceIdMapping(" + "cseBaseCseId TEXT," + "aeId TEXT," + "resourceId TEXT," + "PRIMARY KEY ((cseBaseCseId), aeId)" + ")"; "CREATE TABLE IF NOT EXISTS onem2m.Resources_" + i + "(" + "CONTENT_INSTANCE_OldestId TEXT," + "CONTENT_INSTANCE_LatestId TEXT," + "SUBSCRIPTION_OldestId TEXT," + "SUBSCRIPTION_LatestId TEXT," + "resourceId TEXT PRIMARY KEY," + "resourceType TEXT," + "resourceName TEXT," + "jsonContent TEXT," + "parentId TEXT," + ")"; "CREATE TABLE IF NOT EXISTS onem2m.Children_" + i + "(" + "parentResourceId TEXT," + "childName TEXT," + "childResourceId TEXT," + "nextId TEXT," + "prevId TEXT," + "PRIMARY KEY ((parentResourceId), childName)" + ")"; From: Abhishek Kumar Maheshwari <abhishek.maheshw...@timesinternet.in<mailto:abhishek.maheshw...@timesinternet.in>> Date: Sunday, December 25, 2016 at 8:54 PM To: "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" <bjano...@cisco.com<mailto:bjano...@cisco.com>> Cc: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: RE: Cassandra cluster performance Hi Branislav, What is your column family definition? Thanks & Regards, Abhishek Kumar Maheshwari +91- 805591<tel:+91%208%2005591> (Mobile) Times Internet Ltd. | A Times of India Group Company FC - 6, Sector 16A, Film City, Noida, U.P. 201301 | INDIA P Please do not print this email unless it is absolutely necessary. Spread environmental awareness. From: Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) [mailto:bjano...@cisco.com<mailto:bjano...@cisco.com>] Sent: Thursday, December 22, 2016 6:18 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Cassandra cluster performance Hi, - Consistency level is set to ONE - Keyspace definition: "CREATE KEYSPACE IF NOT EXISTS onem2m " + "WITH replication = " + "{ 'class' : '
Re: Cassandra cluster performance
Can you share your benchmarking code? On Sun, Jan 8, 2017 at 5:51 PM Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) <bjano...@cisco.com> wrote: > Our test data is just couple of short Strings, load on nodes is just 382 > KiB and 408 KiB. > > I read some articles about async writes and switched from execute to > execureAsync for the writes. The results seem to be the same (not good), is > there more that should be done, when doing async writes? > > > > > > *From: *Kant Kodali <k...@peernova.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Friday, January 6, 2017 at 6:05 AM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Cc: *Abhishek Kumar Maheshwari <abhishek.maheshw...@timesinternet.in> > > > *Subject: *Re: Cassandra cluster performance > > > > yeah you should async writes also you cannot neglect data size so you > might want to let us know what your data size is? > > > > > > > > On Thu, Jan 5, 2017 at 2:57 PM, kurt Greaves <k...@instaclustr.com> wrote: > > you should try switching to async writes and then perform the test. sync > writes won't make much difference from a single node but multiple nodes > there should be a massive difference. > > > > On 4 Jan 2017 10:05, "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" > <bjano...@cisco.com> wrote: > > Hi, > > > > Our column family definition is > > > > *"CREATE TABLE onem2m.cse(" *+ > *"name TEXT PRIMARY KEY," *+ > *"resourceId TEXT," *+ > *")"*; > > *"CREATE TABLE IF NOT EXISTS onem2m.AeIdToResourceIdMapping(" *+ > *"cseBaseCseId TEXT," *+ > *"aeId TEXT," *+ > *"resourceId TEXT," *+ > *"PRIMARY KEY ((cseBaseCseId), aeId)" *+ > *")"*; > > > > *"CREATE TABLE IF NOT EXISTS onem2m.Resources_" *+ i + *"(" *+ > *"CONTENT_INSTANCE_OldestId TEXT," *+ > *"CONTENT_INSTANCE_LatestId TEXT," *+ > *"SUBSCRIPTION_OldestId TEXT," *+ > *"SUBSCRIPTION_LatestId TEXT," *+ > *"resourceId TEXT PRIMARY KEY," *+ > *"resourceType TEXT," *+ > *"resourceName TEXT," *+ > *"jsonContent TEXT," *+ > *"parentId TEXT," *+ > *")"*; > > *"CREATE TABLE IF NOT EXISTS onem2m.Children_" *+ i + *"(" *+ > *"parentResourceId TEXT," *+ > *"childName TEXT," *+ > *"childResourceId TEXT," *+ > *"nextId TEXT," *+ > *"prevId TEXT," *+ > *"PRIMARY KEY ((parentResourceId), childName)" *+ > *")"*; > > > > > > > > *From: *Abhishek Kumar Maheshwari <abhishek.maheshw...@timesinternet.in> > *Date: *Sunday, December 25, 2016 at 8:54 PM > *To: *"Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" < > bjano...@cisco.com> > *Cc: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *RE: Cassandra cluster performance > > > > Hi Branislav, > > > > > > What is your column family definition? > > > > > > *Thanks & Regards,* > *Abhishek Kumar Maheshwari* > *+91- 805591 <+91%208%2005591> (Mobile)* > > Times Internet Ltd. | A Times of India Group Company > > FC - 6, Sector 16A, Film City, Noida, U.P. 201301 | INDIA > > *P** Please do not print this email unless it is absolutely necessary. > Spread environmental awareness.* > > > > *From:* Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) [mailto: > bjano...@cisco.com] > *Sent:* Thursday, December 22, 2016 6:18 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Cassandra cluster performance > > > > Hi, > > > > - Consistency level is set to ONE > > - Keyspace definition: > > *"CREATE KEYSPACE IF NOT EXISTS onem2m " *+ > *"WITH replication = " *+ > *"{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}"*; > > > > - yes, the client is on separate VM > > - In our project we use Cassandra API version 3.0.2 but the database > (cluster) is version 3.9 > > - for 2node cluster: > > first VM: 25 GB RAM, 16 CPUs > > second VM: 16 GB RAM, 16 CPUs > > > > > > > > *From: *Ben Slater &l
Re: Cassandra cluster performance
Our test data is just couple of short Strings, load on nodes is just 382 KiB and 408 KiB. I read some articles about async writes and switched from execute to execureAsync for the writes. The results seem to be the same (not good), is there more that should be done, when doing async writes? From: Kant Kodali <k...@peernova.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Friday, January 6, 2017 at 6:05 AM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Cc: Abhishek Kumar Maheshwari <abhishek.maheshw...@timesinternet.in> Subject: Re: Cassandra cluster performance yeah you should async writes also you cannot neglect data size so you might want to let us know what your data size is? On Thu, Jan 5, 2017 at 2:57 PM, kurt Greaves <k...@instaclustr.com<mailto:k...@instaclustr.com>> wrote: you should try switching to async writes and then perform the test. sync writes won't make much difference from a single node but multiple nodes there should be a massive difference. On 4 Jan 2017 10:05, "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" <bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote: Hi, Our column family definition is "CREATE TABLE onem2m.cse(" + "name TEXT PRIMARY KEY," + "resourceId TEXT," + ")"; "CREATE TABLE IF NOT EXISTS onem2m.AeIdToResourceIdMapping(" + "cseBaseCseId TEXT," + "aeId TEXT," + "resourceId TEXT," + "PRIMARY KEY ((cseBaseCseId), aeId)" + ")"; "CREATE TABLE IF NOT EXISTS onem2m.Resources_" + i + "(" + "CONTENT_INSTANCE_OldestId TEXT," + "CONTENT_INSTANCE_LatestId TEXT," + "SUBSCRIPTION_OldestId TEXT," + "SUBSCRIPTION_LatestId TEXT," + "resourceId TEXT PRIMARY KEY," + "resourceType TEXT," + "resourceName TEXT," + "jsonContent TEXT," + "parentId TEXT," + ")"; "CREATE TABLE IF NOT EXISTS onem2m.Children_" + i + "(" + "parentResourceId TEXT," + "childName TEXT," + "childResourceId TEXT," + "nextId TEXT," + "prevId TEXT," + "PRIMARY KEY ((parentResourceId), childName)" + ")"; From: Abhishek Kumar Maheshwari <abhishek.maheshw...@timesinternet.in<mailto:abhishek.maheshw...@timesinternet.in>> Date: Sunday, December 25, 2016 at 8:54 PM To: "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" <bjano...@cisco.com<mailto:bjano...@cisco.com>> Cc: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: RE: Cassandra cluster performance Hi Branislav, What is your column family definition? Thanks & Regards, Abhishek Kumar Maheshwari +91- 805591<tel:+91%208%2005591> (Mobile) Times Internet Ltd. | A Times of India Group Company FC - 6, Sector 16A, Film City, Noida, U.P. 201301 | INDIA P Please do not print this email unless it is absolutely necessary. Spread environmental awareness. From: Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) [mailto:bjano...@cisco.com<mailto:bjano...@cisco.com>] Sent: Thursday, December 22, 2016 6:18 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Cassandra cluster performance Hi, - Consistency level is set to ONE - Keyspace definition: "CREATE KEYSPACE IF NOT EXISTS onem2m " + "WITH replication = " + "{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}"; - yes, the client is on separate VM - In our project we use Cassandra API version 3.0.2 but the database (cluster) is version 3.9 - for 2node cluster: first VM: 25 GB RAM, 16 CPUs second VM: 16 GB RAM, 16 CPUs From: Ben Slater <ben.sla...@instaclustr.com<mailto:ben.sla...@instaclustr.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Wednesday, December 21, 2016 at 2:32 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: Cassandra cluster performance You would expect some drop when moving to single multiple nodes but on the face of it that feels extreme to me (although I’ve never personally tested the difference). Some questions that might help provide an answer: - what consistency level are you using for t
Re: Cassandra cluster performance
yeah you should async writes also you cannot neglect data size so you might want to let us know what your data size is? On Thu, Jan 5, 2017 at 2:57 PM, kurt Greaves <k...@instaclustr.com> wrote: > you should try switching to async writes and then perform the test. sync > writes won't make much difference from a single node but multiple nodes > there should be a massive difference. > > On 4 Jan 2017 10:05, "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" > <bjano...@cisco.com> wrote: > >> Hi, >> >> >> >> Our column family definition is >> >> >> >> *"CREATE TABLE onem2m.cse(" *+ >> *"name TEXT PRIMARY KEY," *+ >> *"resourceId TEXT," *+ >> *")"*; >> >> *"CREATE TABLE IF NOT EXISTS onem2m.AeIdToResourceIdMapping(" *+ >> *"cseBaseCseId TEXT," *+ >> *"aeId TEXT," *+ >> *"resourceId TEXT," *+ >> *"PRIMARY KEY ((cseBaseCseId), aeId)" *+ >> *")"*; >> >> >> >> *"CREATE TABLE IF NOT EXISTS onem2m.Resources_" *+ i + *"(" *+ >> *"CONTENT_INSTANCE_OldestId TEXT," *+ >> *"CONTENT_INSTANCE_LatestId TEXT," *+ >> *"SUBSCRIPTION_OldestId TEXT," *+ >> *"SUBSCRIPTION_LatestId TEXT," *+ >> *"resourceId TEXT PRIMARY KEY," *+ >> *"resourceType TEXT," *+ >> *"resourceName TEXT," *+ >> *"jsonContent TEXT," *+ >> *"parentId TEXT," *+ >> *")"*; >> >> *"CREATE TABLE IF NOT EXISTS onem2m.Children_" *+ i + *"(" *+ >> *"parentResourceId TEXT," *+ >> *"childName TEXT," *+ >> *"childResourceId TEXT," *+ >> *"nextId TEXT," *+ >> *"prevId TEXT," *+ >> *"PRIMARY KEY ((parentResourceId), childName)" *+ >> *")"*; >> >> >> >> >> >> >> >> *From: *Abhishek Kumar Maheshwari <abhishek.maheshw...@timesinternet.in> >> *Date: *Sunday, December 25, 2016 at 8:54 PM >> *To: *"Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" < >> bjano...@cisco.com> >> *Cc: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Subject: *RE: Cassandra cluster performance >> >> >> >> Hi Branislav, >> >> >> >> >> >> What is your column family definition? >> >> >> >> >> >> *Thanks & Regards,* >> *Abhishek Kumar Maheshwari* >> *+91- 805591 <+91%208%2005591> (Mobile)* >> >> Times Internet Ltd. | A Times of India Group Company >> >> FC - 6, Sector 16A, Film City, Noida, U.P. 201301 | INDIA >> >> *P** Please do not print this email unless it is absolutely necessary. >> Spread environmental awareness.* >> >> >> >> *From:* Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) [mailto: >> bjano...@cisco.com] >> *Sent:* Thursday, December 22, 2016 6:18 AM >> *To:* user@cassandra.apache.org >> *Subject:* Re: Cassandra cluster performance >> >> >> >> Hi, >> >> >> >> - Consistency level is set to ONE >> >> - Keyspace definition: >> >> *"CREATE KEYSPACE IF NOT EXISTS onem2m " *+ >> *"WITH replication = " *+ >> *"{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}"*; >> >> >> >> - yes, the client is on separate VM >> >> - In our project we use Cassandra API version 3.0.2 but the database >> (cluster) is version 3.9 >> >> - for 2node cluster: >> >> first VM: 25 GB RAM, 16 CPUs >> >> second VM: 16 GB RAM, 16 CPUs >> >> >> >> >> >> >> >> *From: *Ben Slater <ben.sla...@instaclustr.com> >> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Date: *Wednesday, December 21, 2016 at 2:32 PM >> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Subject: *Re: Cassandra cluster performance >> >> >> >> You would expect some drop when moving to single multiple nodes but on >> the face of it that feels extreme to me (although I’v
Re: Cassandra cluster performance
you should try switching to async writes and then perform the test. sync writes won't make much difference from a single node but multiple nodes there should be a massive difference. On 4 Jan 2017 10:05, "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" < bjano...@cisco.com> wrote: > Hi, > > > > Our column family definition is > > > > *"CREATE TABLE onem2m.cse(" *+ > *"name TEXT PRIMARY KEY," *+ > *"resourceId TEXT," *+ > *")"*; > > *"CREATE TABLE IF NOT EXISTS onem2m.AeIdToResourceIdMapping(" *+ > *"cseBaseCseId TEXT," *+ > *"aeId TEXT," *+ > *"resourceId TEXT," *+ > *"PRIMARY KEY ((cseBaseCseId), aeId)" *+ > *")"*; > > > > *"CREATE TABLE IF NOT EXISTS onem2m.Resources_" *+ i + *"(" *+ > *"CONTENT_INSTANCE_OldestId TEXT," *+ > *"CONTENT_INSTANCE_LatestId TEXT," *+ > *"SUBSCRIPTION_OldestId TEXT," *+ > *"SUBSCRIPTION_LatestId TEXT," *+ > *"resourceId TEXT PRIMARY KEY," *+ > *"resourceType TEXT," *+ > *"resourceName TEXT," *+ > *"jsonContent TEXT," *+ > *"parentId TEXT," *+ > *")"*; > > *"CREATE TABLE IF NOT EXISTS onem2m.Children_" *+ i + *"(" *+ > *"parentResourceId TEXT," *+ > *"childName TEXT," *+ > *"childResourceId TEXT," *+ > *"nextId TEXT," *+ > *"prevId TEXT," *+ > *"PRIMARY KEY ((parentResourceId), childName)" *+ > *")"*; > > > > > > > > *From: *Abhishek Kumar Maheshwari <abhishek.maheshw...@timesinternet.in> > *Date: *Sunday, December 25, 2016 at 8:54 PM > *To: *"Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" < > bjano...@cisco.com> > *Cc: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *RE: Cassandra cluster performance > > > > Hi Branislav, > > > > > > What is your column family definition? > > > > > > *Thanks & Regards,* > *Abhishek Kumar Maheshwari* > *+91- 805591 <+91%208%2005591> (Mobile)* > > Times Internet Ltd. | A Times of India Group Company > > FC - 6, Sector 16A, Film City, Noida, U.P. 201301 | INDIA > > *P** Please do not print this email unless it is absolutely necessary. > Spread environmental awareness.* > > > > *From:* Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) [mailto: > bjano...@cisco.com] > *Sent:* Thursday, December 22, 2016 6:18 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Cassandra cluster performance > > > > Hi, > > > > - Consistency level is set to ONE > > - Keyspace definition: > > *"CREATE KEYSPACE IF NOT EXISTS onem2m " *+ > *"WITH replication = " *+ > *"{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}"*; > > > > - yes, the client is on separate VM > > - In our project we use Cassandra API version 3.0.2 but the database > (cluster) is version 3.9 > > - for 2node cluster: > > first VM: 25 GB RAM, 16 CPUs > > second VM: 16 GB RAM, 16 CPUs > > > > > > > > *From: *Ben Slater <ben.sla...@instaclustr.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Wednesday, December 21, 2016 at 2:32 PM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *Re: Cassandra cluster performance > > > > You would expect some drop when moving to single multiple nodes but on the > face of it that feels extreme to me (although I’ve never personally tested > the difference). Some questions that might help provide an answer: > > - what consistency level are you using for the test? > > - what is your keyspace definition (replication factor most importantly)? > > - where are you running your test client (is it a separate box to > cassandra)? > > - what C* version? > > - what are specs (CPU, RAM) of the test servers? > > > > Cheers > > Ben > > > > On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at > Cisco) <bjano...@cisco.com> wrote: > > Hi all, > > > > I’m working on a project and we have Java benchmark test for testing the > performance when using Cassandra database. Create operatio
Re: Cassandra cluster performance
Hi, Our column family definition is "CREATE TABLE onem2m.cse(" + "name TEXT PRIMARY KEY," + "resourceId TEXT," + ")"; "CREATE TABLE IF NOT EXISTS onem2m.AeIdToResourceIdMapping(" + "cseBaseCseId TEXT," + "aeId TEXT," + "resourceId TEXT," + "PRIMARY KEY ((cseBaseCseId), aeId)" + ")"; "CREATE TABLE IF NOT EXISTS onem2m.Resources_" + i + "(" + "CONTENT_INSTANCE_OldestId TEXT," + "CONTENT_INSTANCE_LatestId TEXT," + "SUBSCRIPTION_OldestId TEXT," + "SUBSCRIPTION_LatestId TEXT," + "resourceId TEXT PRIMARY KEY," + "resourceType TEXT," + "resourceName TEXT," + "jsonContent TEXT," + "parentId TEXT," + ")"; "CREATE TABLE IF NOT EXISTS onem2m.Children_" + i + "(" + "parentResourceId TEXT," + "childName TEXT," + "childResourceId TEXT," + "nextId TEXT," + "prevId TEXT," + "PRIMARY KEY ((parentResourceId), childName)" + ")"; From: Abhishek Kumar Maheshwari <abhishek.maheshw...@timesinternet.in> Date: Sunday, December 25, 2016 at 8:54 PM To: "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" <bjano...@cisco.com> Cc: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: RE: Cassandra cluster performance Hi Branislav, What is your column family definition? Thanks & Regards, Abhishek Kumar Maheshwari +91- 805591 (Mobile) Times Internet Ltd. | A Times of India Group Company FC - 6, Sector 16A, Film City, Noida, U.P. 201301 | INDIA P Please do not print this email unless it is absolutely necessary. Spread environmental awareness. From: Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) [mailto:bjano...@cisco.com] Sent: Thursday, December 22, 2016 6:18 AM To: user@cassandra.apache.org Subject: Re: Cassandra cluster performance Hi, - Consistency level is set to ONE - Keyspace definition: "CREATE KEYSPACE IF NOT EXISTS onem2m " + "WITH replication = " + "{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}"; - yes, the client is on separate VM - In our project we use Cassandra API version 3.0.2 but the database (cluster) is version 3.9 - for 2node cluster: first VM: 25 GB RAM, 16 CPUs second VM: 16 GB RAM, 16 CPUs From: Ben Slater <ben.sla...@instaclustr.com<mailto:ben.sla...@instaclustr.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Wednesday, December 21, 2016 at 2:32 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: Cassandra cluster performance You would expect some drop when moving to single multiple nodes but on the face of it that feels extreme to me (although I’ve never personally tested the difference). Some questions that might help provide an answer: - what consistency level are you using for the test? - what is your keyspace definition (replication factor most importantly)? - where are you running your test client (is it a separate box to cassandra)? - what C* version? - what are specs (CPU, RAM) of the test servers? Cheers Ben On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) <bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote: Hi all, I’m working on a project and we have Java benchmark test for testing the performance when using Cassandra database. Create operation on a single node Cassandra cluster is about 15K operations per second. Problem we have is when I set up cluster with 2 or more nodes (each of them are on separate virtual machines and servers), the performance goes down to 1K ops/sec. I follow the official instructions on how to set up a multinode cluster – the only things I change in Cassandra.yaml file are: change seeds to IP address of one node, change listen and rpc address to IP address of the node and finally change endpoint snitch to GossipingPropertyFileSnitch. The replication factor is set to 1 when having 2-node cluster. I use only one datacenter. The cluster seems to be doing fine (I can see nodes communicating) and so is the CPU, RAM usage on the machines. Does anybody have any ideas? Any help would be very appreciated. Thanks! A must visit exhibition for all Fitness and Sports Freaks. TOI Global Sports Business Show from 21 to 23 December 2016 Bombay Exhibition Centre, Mumbai. Meet the legends Kaizzad Capadia, Bhaichung Bhutia and more. Join the workshops on Boxing & Football and more. www.TOI-GSBS.com
Re: Cassandra cluster performance
Hi, No we are not using async writes. From: kurt Greaves <k...@instaclustr.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Friday, December 23, 2016 at 12:17 AM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: Cassandra cluster performance Branislav, are you doing async writes?
RE: Cassandra cluster performance
Hi Branislav, What is your column family definition? Thanks & Regards, Abhishek Kumar Maheshwari +91- 805591 (Mobile) Times Internet Ltd. | A Times of India Group Company FC - 6, Sector 16A, Film City, Noida, U.P. 201301 | INDIA P Please do not print this email unless it is absolutely necessary. Spread environmental awareness. From: Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) [mailto:bjano...@cisco.com] Sent: Thursday, December 22, 2016 6:18 AM To: user@cassandra.apache.org Subject: Re: Cassandra cluster performance Hi, - Consistency level is set to ONE - Keyspace definition: "CREATE KEYSPACE IF NOT EXISTS onem2m " + "WITH replication = " + "{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}"; - yes, the client is on separate VM - In our project we use Cassandra API version 3.0.2 but the database (cluster) is version 3.9 - for 2node cluster: first VM: 25 GB RAM, 16 CPUs second VM: 16 GB RAM, 16 CPUs From: Ben Slater <ben.sla...@instaclustr.com<mailto:ben.sla...@instaclustr.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Wednesday, December 21, 2016 at 2:32 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: Cassandra cluster performance You would expect some drop when moving to single multiple nodes but on the face of it that feels extreme to me (although I’ve never personally tested the difference). Some questions that might help provide an answer: - what consistency level are you using for the test? - what is your keyspace definition (replication factor most importantly)? - where are you running your test client (is it a separate box to cassandra)? - what C* version? - what are specs (CPU, RAM) of the test servers? Cheers Ben On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) <bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote: Hi all, I’m working on a project and we have Java benchmark test for testing the performance when using Cassandra database. Create operation on a single node Cassandra cluster is about 15K operations per second. Problem we have is when I set up cluster with 2 or more nodes (each of them are on separate virtual machines and servers), the performance goes down to 1K ops/sec. I follow the official instructions on how to set up a multinode cluster – the only things I change in Cassandra.yaml file are: change seeds to IP address of one node, change listen and rpc address to IP address of the node and finally change endpoint snitch to GossipingPropertyFileSnitch. The replication factor is set to 1 when having 2-node cluster. I use only one datacenter. The cluster seems to be doing fine (I can see nodes communicating) and so is the CPU, RAM usage on the machines. Does anybody have any ideas? Any help would be very appreciated. Thanks! A must visit exhibition for all Fitness and Sports Freaks. TOI Global Sports Business Show from 21 to 23 December 2016 Bombay Exhibition Centre, Mumbai. Meet the legends Kaizzad Capadia, Bhaichung Bhutia and more. Join the workshops on Boxing & Football and more. www.TOI-GSBS.com
Re: Cassandra cluster performance
Branislav, are you doing async writes?
Re: Cassandra cluster performance
Yes, there is definitely something wrong but I’m struggling to figure out what exactly. To answer your questions. - There are no errors in client or Cassandra - I tried manual inserts and there are no errors either, I set the tracing on so I can see that the data is distributed to different partitions. Even when I use nodetool status, the Owns is 48.4% to 51.6%. Regards, Branislav From: Ben Slater <ben.sla...@instaclustr.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Wednesday, December 21, 2016 at 6:20 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: Cassandra cluster performance Given you’re using replication factor 1 (so each piece of data is only going to get written to one node) something definitely seems wrong. Some questions/ideas: - are there any errors in the Cassandra logs or are you seeing any errors at the client? - is your test data distributed across your partition key or is it possible all your test data is going to a single partition? - have you tried manually running a few inserts to see if you get any errors? Cheers Ben On Thu, 22 Dec 2016 at 11:48 Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) <bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote: Hi, - Consistency level is set to ONE - Keyspace definition: "CREATE KEYSPACE IF NOT EXISTS onem2m " + "WITH replication = " + "{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}"; - yes, the client is on separate VM - In our project we use Cassandra API version 3.0.2 but the database (cluster) is version 3.9 - for 2node cluster: first VM: 25 GB RAM, 16 CPUs second VM: 16 GB RAM, 16 CPUs From: Ben Slater <ben.sla...@instaclustr.com<mailto:ben.sla...@instaclustr.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Wednesday, December 21, 2016 at 2:32 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: Cassandra cluster performance You would expect some drop when moving to single multiple nodes but on the face of it that feels extreme to me (although I’ve never personally tested the difference). Some questions that might help provide an answer: - what consistency level are you using for the test? - what is your keyspace definition (replication factor most importantly)? - where are you running your test client (is it a separate box to cassandra)? - what C* version? - what are specs (CPU, RAM) of the test servers? Cheers Ben On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) <bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote: Hi all, I’m working on a project and we have Java benchmark test for testing the performance when using Cassandra database. Create operation on a single node Cassandra cluster is about 15K operations per second. Problem we have is when I set up cluster with 2 or more nodes (each of them are on separate virtual machines and servers), the performance goes down to 1K ops/sec. I follow the official instructions on how to set up a multinode cluster – the only things I change in Cassandra.yaml file are: change seeds to IP address of one node, change listen and rpc address to IP address of the node and finally change endpoint snitch to GossipingPropertyFileSnitch. The replication factor is set to 1 when having 2-node cluster. I use only one datacenter. The cluster seems to be doing fine (I can see nodes communicating) and so is the CPU, RAM usage on the machines. Does anybody have any ideas? Any help would be very appreciated. Thanks!
Re: Cassandra cluster performance
Given you’re using replication factor 1 (so each piece of data is only going to get written to one node) something definitely seems wrong. Some questions/ideas: - are there any errors in the Cassandra logs or are you seeing any errors at the client? - is your test data distributed across your partition key or is it possible all your test data is going to a single partition? - have you tried manually running a few inserts to see if you get any errors? Cheers Ben On Thu, 22 Dec 2016 at 11:48 Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) <bjano...@cisco.com> wrote: > Hi, > > > > - Consistency level is set to ONE > > - Keyspace definition: > > *"CREATE KEYSPACE IF NOT EXISTS onem2m " *+ > *"WITH replication = " *+ > *"{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}"*; > > > > - yes, the client is on separate VM > > - In our project we use Cassandra API version 3.0.2 but the database > (cluster) is version 3.9 > > - for 2node cluster: > > first VM: 25 GB RAM, 16 CPUs > > second VM: 16 GB RAM, 16 CPUs > > > > > > > > *From: *Ben Slater <ben.sla...@instaclustr.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Wednesday, December 21, 2016 at 2:32 PM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *Re: Cassandra cluster performance > > > > You would expect some drop when moving to single multiple nodes but on the > face of it that feels extreme to me (although I’ve never personally tested > the difference). Some questions that might help provide an answer: > > - what consistency level are you using for the test? > > - what is your keyspace definition (replication factor most importantly)? > > - where are you running your test client (is it a separate box to > cassandra)? > > - what C* version? > > - what are specs (CPU, RAM) of the test servers? > > > > Cheers > > Ben > > > > On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at > Cisco) <bjano...@cisco.com> wrote: > > Hi all, > > > > I’m working on a project and we have Java benchmark test for testing the > performance when using Cassandra database. Create operation on a single > node Cassandra cluster is about 15K operations per second. Problem we have > is when I set up cluster with 2 or more nodes (each of them are on separate > virtual machines and servers), the performance goes down to 1K ops/sec. I > follow the official instructions on how to set up a multinode cluster – the > only things I change in Cassandra.yaml file are: change seeds to IP address > of one node, change listen and rpc address to IP address of the node and > finally change endpoint snitch to GossipingPropertyFileSnitch. The > replication factor is set to 1 when having 2-node cluster. I use only one > datacenter. The cluster seems to be doing fine (I can see nodes > communicating) and so is the CPU, RAM usage on the machines. > > > > Does anybody have any ideas? Any help would be very appreciated. > > > > Thanks! > > > >
Re: Cassandra cluster performance
Hi, - Consistency level is set to ONE - Keyspace definition: "CREATE KEYSPACE IF NOT EXISTS onem2m " + "WITH replication = " + "{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}"; - yes, the client is on separate VM - In our project we use Cassandra API version 3.0.2 but the database (cluster) is version 3.9 - for 2node cluster: first VM: 25 GB RAM, 16 CPUs second VM: 16 GB RAM, 16 CPUs From: Ben Slater <ben.sla...@instaclustr.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Wednesday, December 21, 2016 at 2:32 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: Cassandra cluster performance You would expect some drop when moving to single multiple nodes but on the face of it that feels extreme to me (although I’ve never personally tested the difference). Some questions that might help provide an answer: - what consistency level are you using for the test? - what is your keyspace definition (replication factor most importantly)? - where are you running your test client (is it a separate box to cassandra)? - what C* version? - what are specs (CPU, RAM) of the test servers? Cheers Ben On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) <bjano...@cisco.com<mailto:bjano...@cisco.com>> wrote: Hi all, I’m working on a project and we have Java benchmark test for testing the performance when using Cassandra database. Create operation on a single node Cassandra cluster is about 15K operations per second. Problem we have is when I set up cluster with 2 or more nodes (each of them are on separate virtual machines and servers), the performance goes down to 1K ops/sec. I follow the official instructions on how to set up a multinode cluster – the only things I change in Cassandra.yaml file are: change seeds to IP address of one node, change listen and rpc address to IP address of the node and finally change endpoint snitch to GossipingPropertyFileSnitch. The replication factor is set to 1 when having 2-node cluster. I use only one datacenter. The cluster seems to be doing fine (I can see nodes communicating) and so is the CPU, RAM usage on the machines. Does anybody have any ideas? Any help would be very appreciated. Thanks!
Re: Cassandra cluster performance
You would expect some drop when moving to single multiple nodes but on the face of it that feels extreme to me (although I’ve never personally tested the difference). Some questions that might help provide an answer: - what consistency level are you using for the test? - what is your keyspace definition (replication factor most importantly)? - where are you running your test client (is it a separate box to cassandra)? - what C* version? - what are specs (CPU, RAM) of the test servers? Cheers Ben On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)wrote: > Hi all, > > > > I’m working on a project and we have Java benchmark test for testing the > performance when using Cassandra database. Create operation on a single > node Cassandra cluster is about 15K operations per second. Problem we have > is when I set up cluster with 2 or more nodes (each of them are on separate > virtual machines and servers), the performance goes down to 1K ops/sec. I > follow the official instructions on how to set up a multinode cluster – the > only things I change in Cassandra.yaml file are: change seeds to IP address > of one node, change listen and rpc address to IP address of the node and > finally change endpoint snitch to GossipingPropertyFileSnitch. The > replication factor is set to 1 when having 2-node cluster. I use only one > datacenter. The cluster seems to be doing fine (I can see nodes > communicating) and so is the CPU, RAM usage on the machines. > > > > Does anybody have any ideas? Any help would be very appreciated. > > > > Thanks! > > >
Cassandra cluster performance
Hi all, I’m working on a project and we have Java benchmark test for testing the performance when using Cassandra database. Create operation on a single node Cassandra cluster is about 15K operations per second. Problem we have is when I set up cluster with 2 or more nodes (each of them are on separate virtual machines and servers), the performance goes down to 1K ops/sec. I follow the official instructions on how to set up a multinode cluster – the only things I change in Cassandra.yaml file are: change seeds to IP address of one node, change listen and rpc address to IP address of the node and finally change endpoint snitch to GossipingPropertyFileSnitch. The replication factor is set to 1 when having 2-node cluster. I use only one datacenter. The cluster seems to be doing fine (I can see nodes communicating) and so is the CPU, RAM usage on the machines. Does anybody have any ideas? Any help would be very appreciated. Thanks!