Re: trouble showing cluster scalability for read performance

2014-07-17 Thread Duncan Sands

Hi Diane,

On 17/07/14 06:19, Diane Griffith wrote:

We have been struggling proving out linear read performance with our cassandra
configuration, that it is horizontally scaling.  Wondering if anyone has any
suggestions for what minimal configuration and approach to use to demonstrate 
this.

We were trying to go for a simple set up, so on the keyspace and/or column
families we went with the following settings thinking it was the minimal to
prove scaling:

replication_factor set to 1,


a RF of 1 means that any particular bit of data exists on exactly one node.  So 
if you are testing read speed by reading the same data item again and again as 
fast as you can, then all the reads will be coming from the same one node, the 
one that has that data item on it.  In this situation adding more nodes won't 
help.  Maybe this isn't exactly how you are testing read speed, but perhaps you 
are doing something analogous?  I suggest you explain how you are measuring read 
speed exactly.


Ciao, Duncan.


SimpleStrategy,
default consistency level,
default compaction strategy (size tiered),
but compacted down to 1 sstable per cf on each node (versus using leveled
compaction for read performance)

*Read Performance Results:*
1 client thread - 2 nodes  1 node was seen but we couldn't show increased
performance adding more nodes i.e 4 nodes !  2 nodes
2 client threads - 2 nodes  1 node still was true but again we couldn't show
increased performance adding more nodes i.e. 4 nodes !  2 nodes
10 client threads - this time 2 nodes  1 node on performance numbers.  2 nodes
suffered from larger reduce throughput than 1 node was showing.

Where are we going wrong?

How have others shown horizontal scaling for reads?

Thanks,
Diane




Re: trouble showing cluster scalability for read performance

2014-07-17 Thread Diane Griffith
Duncan,

Thanks for that feedback.  I'll give a bit more info and then ask some more
questions.

*Our Goal*:  Not to produce the fastest read but show horizontal scaling.

*Test procedure*:
* Inserted 54M rows where one third of that represents a unique key, 18M
keys.  End result given our schema is the 54M rows becomes 72M rows in the
column family as the control query load to use.
* have a client that queries 100k records in configurable batches, set to
1k.  And then it does 100 reps of queries.  It doesn't do the same keys for
each rep, it uses an offset and then it increases the keys to query.
* We can adjust the hit rate, i.e. how many of the keys will be found but
have been focused on 100% hit rate
* we run the query where multiple clients can be spawned to do the same
query cycle 100k keys but the offset is not different so each client will
query the same keys.
* We thought we should manually compact the tables down to 1 sstable on a
given node for consistent results across different cluster sizes
* We had set replication factor to 1 originally to not complicate things or
impact initial write times even.  We would assess rf later was our thought.
 Since we changed the keys getting queried it would have to hit additional
nodes to get row data but for just 1 client thread (to get simplest path to
show horizontal scaling, had a slight decrease of performance when going to
4 nodes from 2 nodes)

Things seen off of given procedure and set up:


   1. 1 client thread:  2 nodes do better than 1 node on the query test.
But 4 nodes did not do better than 2.
   2. 2 client threads: 2 nodes were still doing better than 1 node
   3. 10 client threads: the times drastically suffered and 2 nodes were
   doing 1/2 the speed of 1 node but before 1 to 2 threads performed better on
   2 nodes vs 1 node.  There was a huge decrease in performance on 2 nodes and
   just a mild decrease on 1 node.

Note: 50+ threads was also drastically falling apart.

*Observations*:

   - compacting each node to 1 table did not seem to help as running 10
   client threads on exploded sstables and 2 nodes was 2x better than the last
   2 node 10 client test but still decreased performance from 1 to 2 threads
   query against compacted tables
   - I would see upwards to 10 read requests pending at times while 8 to 10
   were processing when I did nodetool tpstats.
   - having key cache on or disabled did not seem to impact things
   noticeably with our current configuration

.

*Questions:*

   1. can multiple threads read the same sstable at the same time?  Does
   compacting down to 1 sstable (to get a given row into one sstable) add any
   benefit or actually hurt like limited testing has indicated currently?
   2. given the above testing process, does it still make sense to adjust
   replication factor appropriately for cluster size (i.e. 1 for 1 node
   cluster, 2 for 2 node cluster, 3 for n size cluster).  We assumed it was
   just the ability for threads to connect into a coordinator that would help
   but sounds like it can still block


I'm going to try a limited test with changing replication factor.  But if
anyone has any input on compacting to 1 sstable benefit or detriment on
just simple scalability test, how if at all does cassandra block on reading
sstables, and if higher replication factors do indeed help produce reliable
results it would be appreciated.  I know part of our charter was keep it
simple to produce the scalability proof but it does sound like replication
factor is hurting us if the delay between clients for the same keys is not
long enough given the fact we are not doing different offsets for each
client thread.

Thanks,
Diane

On Thu, Jul 17, 2014 at 3:53 AM, Duncan Sands duncan.sa...@gmail.com
wrote:

 Hi Diane,


 On 17/07/14 06:19, Diane Griffith wrote:

 We have been struggling proving out linear read performance with our
 cassandra
 configuration, that it is horizontally scaling.  Wondering if anyone has
 any
 suggestions for what minimal configuration and approach to use to
 demonstrate this.

 We were trying to go for a simple set up, so on the keyspace and/or column
 families we went with the following settings thinking it was the minimal
 to
 prove scaling:

 replication_factor set to 1,


 a RF of 1 means that any particular bit of data exists on exactly one
 node.  So if you are testing read speed by reading the same data item again
 and again as fast as you can, then all the reads will be coming from the
 same one node, the one that has that data item on it.  In this situation
 adding more nodes won't help.  Maybe this isn't exactly how you are testing
 read speed, but perhaps you are doing something analogous?  I suggest you
 explain how you are measuring read speed exactly.

 Ciao, Duncan.

  SimpleStrategy,
 default consistency level,
 default compaction strategy (size tiered),
 but compacted down to 1 sstable per cf on each node (versus using leveled
 compaction for read performance)

 

Re: trouble showing cluster scalability for read performance

2014-07-17 Thread Jack Krupansky
It sounds as if you are actually testing “vertical scalability” (load on a 
single node) rather than Cassandra’s sweet spot of “horizontal scalability” 
(add more nodes to handle higher load.) Maybe you could clarify your intentions 
and specific use case.

Also, it sounds like you are trying to focus on large queries, but Cassandra’s 
sweet spot is lots of smaller queries. With larger queries you can end up 
measuring things like the capabilities of your hardware, cpu cores, memory, I/O 
bandwidth, network latency, JVM configuration, etc. rather than measuring 
Cassandra per se. So, again, maybe you could clarify your intended use case.

It might be that you need to add more “vertical scale” (bigger box, more cores, 
more memory, beefier I/O and networking) to handle large queries, or maybe 
simple, Cassandra-style “horizontal scaling” (adding nodes) will be sufficient. 
Sure, you can tune Cassandra for single-node performance, but that seems lot a 
lot of extra work, to me, compared to adding more cheap nodes.

-- Jack Krupansky

From: Diane Griffith 
Sent: Thursday, July 17, 2014 9:31 AM
To: user 
Subject: Re: trouble showing cluster scalability for read performance

Duncan,  

Thanks for that feedback.  I'll give a bit more info and then ask some more 
questions. 

Our Goal:  Not to produce the fastest read but show horizontal scaling.

Test procedure:  
* Inserted 54M rows where one third of that represents a unique key, 18M keys.  
End result given our schema is the 54M rows becomes 72M rows in the column 
family as the control query load to use.
* have a client that queries 100k records in configurable batches, set to 1k.  
And then it does 100 reps of queries.  It doesn't do the same keys for each 
rep, it uses an offset and then it increases the keys to query.  
* We can adjust the hit rate, i.e. how many of the keys will be found but have 
been focused on 100% hit rate
* we run the query where multiple clients can be spawned to do the same query 
cycle 100k keys but the offset is not different so each client will query the 
same keys.
* We thought we should manually compact the tables down to 1 sstable on a given 
node for consistent results across different cluster sizes
* We had set replication factor to 1 originally to not complicate things or 
impact initial write times even.  We would assess rf later was our thought.  
Since we changed the keys getting queried it would have to hit additional nodes 
to get row data but for just 1 client thread (to get simplest path to show 
horizontal scaling, had a slight decrease of performance when going to 4 nodes 
from 2 nodes)

Things seen off of given procedure and set up:

  1.. 1 client thread:  2 nodes do better than 1 node on the query test.  But 4 
nodes did not do better than 2.

  2.. 2 client threads: 2 nodes were still doing better than 1 node 
  3.. 10 client threads: the times drastically suffered and 2 nodes were doing 
1/2 the speed of 1 node but before 1 to 2 threads performed better on 2 nodes 
vs 1 node.  There was a huge decrease in performance on 2 nodes and just a mild 
decrease on 1 node. 
Note: 50+ threads was also drastically falling apart.


Observations:
  a.. compacting each node to 1 table did not seem to help as running 10 client 
threads on exploded sstables and 2 nodes was 2x better than the last 2 node 10 
client test but still decreased performance from 1 to 2 threads query against 
compacted tables

  b.. I would see upwards to 10 read requests pending at times while 8 to 10 
were processing when I did nodetool tpstats.

  c.. having key cache on or disabled did not seem to impact things noticeably 
with our current configuration

.

Questions:
  1.. can multiple threads read the same sstable at the same time?  Does 
compacting down to 1 sstable (to get a given row into one sstable) add any 
benefit or actually hurt like limited testing has indicated currently?

  2.. given the above testing process, does it still make sense to adjust 
replication factor appropriately for cluster size (i.e. 1 for 1 node cluster, 2 
for 2 node cluster, 3 for n size cluster).  We assumed it was just the ability 
for threads to connect into a coordinator that would help but sounds like it 
can still block


I'm going to try a limited test with changing replication factor.  But if 
anyone has any input on compacting to 1 sstable benefit or detriment on just 
simple scalability test, how if at all does cassandra block on reading 
sstables, and if higher replication factors do indeed help produce reliable 
results it would be appreciated.  I know part of our charter was keep it simple 
to produce the scalability proof but it does sound like replication factor is 
hurting us if the delay between clients for the same keys is not long enough 
given the fact we are not doing different offsets for each client thread.  

Thanks,
Diane


On Thu, Jul 17, 2014 at 3:53 AM, Duncan Sands duncan.sa...@gmail.com wrote:

  Hi Diane, 


  On 17/07/14 06:19

Re: trouble showing cluster scalability for read performance

2014-07-17 Thread Diane Griffith
Definitely not trying to show vertical scaling.  We have a query use case
we are trying to show will scale as we add more nodes should performance
fall below adequate.   But to show the scaling we do the test on a 1 node
cluster, then 2 node cluster, then 4 node cluster with a goal that query
throughput increases when adding more nodes.

Basically we do not want to tune for single node performance and did want
to prove out adding nodes works but for our query use case it hasn't yet.
 Our query size is a valid use case though for our need.

Earlier it may not have been clear but we are not querying the same key
over and over in one thread but continuously querying random non
duplicating keys.  Bringing up the threading was not our main path or
desired goal so I re-posted with clearer intent hopefully of our goal, what
we experienced in the past against THRIFT and an older version of Cassandra
which we have not been able to duplicate via CQL and Cassandra 2.0.6.

So just hoping someone has suggestions of what one must do at a minimum to
prove horizontal scaling or have suggestions of what to look at in our
current datasize/query use case that may be causing us to not achieve
horizontal scaling.

Thanks,
Diane




On Thu, Jul 17, 2014 at 10:03 AM, Jack Krupansky j...@basetechnology.com
wrote:

   It sounds as if you are actually testing “vertical scalability” (load
 on a single node) rather than Cassandra’s sweet spot of “horizontal
 scalability” (add more nodes to handle higher load.) Maybe you could
 clarify your intentions and specific use case.

 Also, it sounds like you are trying to focus on large queries, but
 Cassandra’s sweet spot is lots of smaller queries. With larger queries you
 can end up measuring things like the capabilities of your hardware, cpu
 cores, memory, I/O bandwidth, network latency, JVM configuration, etc.
 rather than measuring Cassandra per se. So, again, maybe you could clarify
 your intended use case.

 It might be that you need to add more “vertical scale” (bigger box, more
 cores, more memory, beefier I/O and networking) to handle large queries, or
 maybe simple, Cassandra-style “horizontal scaling” (adding nodes) will be
 sufficient. Sure, you can tune Cassandra for single-node performance, but
 that seems lot a lot of extra work, to me, compared to adding more cheap
 nodes.

 -- Jack Krupansky

  *From:* Diane Griffith dfgriff...@gmail.com
 *Sent:* Thursday, July 17, 2014 9:31 AM
 *To:* user user@cassandra.apache.org
 *Subject:* Re: trouble showing cluster scalability for read performance

  Duncan,

 Thanks for that feedback.  I'll give a bit more info and then ask some
 more questions.

 *Our Goal*:  Not to produce the fastest read but show horizontal scaling.

  *Test procedure*:
 * Inserted 54M rows where one third of that represents a unique key, 18M
 keys.  End result given our schema is the 54M rows becomes 72M rows in the
 column family as the control query load to use.
 * have a client that queries 100k records in configurable batches, set to
 1k.  And then it does 100 reps of queries.  It doesn't do the same keys for
 each rep, it uses an offset and then it increases the keys to query.
 * We can adjust the hit rate, i.e. how many of the keys will be found but
 have been focused on 100% hit rate
 * we run the query where multiple clients can be spawned to do the same
 query cycle 100k keys but the offset is not different so each client will
 query the same keys.
 * We thought we should manually compact the tables down to 1 sstable on a
 given node for consistent results across different cluster sizes
 * We had set replication factor to 1 originally to not complicate things
 or impact initial write times even.  We would assess rf later was our
 thought.  Since we changed the keys getting queried it would have to hit
 additional nodes to get row data but for just 1 client thread (to get
 simplest path to show horizontal scaling, had a slight decrease of
 performance when going to 4 nodes from 2 nodes)

 Things seen off of given procedure and set up:


1. 1 client thread:  2 nodes do better than 1 node on the query test.
But 4 nodes did not do better than 2.
2. 2 client threads: 2 nodes were still doing better than 1 node
3. 10 client threads: the times drastically suffered and 2 nodes were
doing 1/2 the speed of 1 node but before 1 to 2 threads performed better on
2 nodes vs 1 node.  There was a huge decrease in performance on 2 nodes and
just a mild decrease on 1 node.

 Note: 50+ threads was also drastically falling apart.

 *Observations*:

- compacting each node to 1 table did not seem to help as running 10
client threads on exploded sstables and 2 nodes was 2x better than the last
2 node 10 client test but still decreased performance from 1 to 2 threads
query against compacted tables
- I would see upwards to 10 read requests pending at times while 8 to
10 were processing when I did nodetool tpstats

Re: trouble showing cluster scalability for read performance

2014-07-17 Thread Timo Ahokas
Hi Diane,

Sounds a bit like the client might be the limiting factor in your test -
not the server. Especially if you're using one single threaded client, you
might not be loading the backend in any significant way. Have you done any
vertical scaling tests (identical client, bigger server)? if the client is
indeed the limiting factor, then adding server capacity probably doesn't
gain you much. What sort of CPU/IO load do you have on the client/server
during your tests?

I might be barking up the wrong tree (we haven't done any load tests yet on
Cassandra), but when we load tested our clustered app, we used 3-10 client
machines (with multithreaded clients) against 3 app server nodes.

I would definitely first try to add more client load (multiple
clients/multithreading and/or client machines) and once you're actually
hitting the server properly, then add more server nodes.

Best regards,
Timo


On 17 July 2014 20:39, Diane Griffith dfgriff...@gmail.com wrote:

 Definitely not trying to show vertical scaling.  We have a query use case
 we are trying to show will scale as we add more nodes should performance
 fall below adequate.   But to show the scaling we do the test on a 1 node
 cluster, then 2 node cluster, then 4 node cluster with a goal that query
 throughput increases when adding more nodes.

 Basically we do not want to tune for single node performance and did want
 to prove out adding nodes works but for our query use case it hasn't yet.
  Our query size is a valid use case though for our need.

 Earlier it may not have been clear but we are not querying the same key
 over and over in one thread but continuously querying random non
 duplicating keys.  Bringing up the threading was not our main path or
 desired goal so I re-posted with clearer intent hopefully of our goal, what
 we experienced in the past against THRIFT and an older version of Cassandra
 which we have not been able to duplicate via CQL and Cassandra 2.0.6.

 So just hoping someone has suggestions of what one must do at a minimum to
 prove horizontal scaling or have suggestions of what to look at in our
 current datasize/query use case that may be causing us to not achieve
 horizontal scaling.

 Thanks,
 Diane




 On Thu, Jul 17, 2014 at 10:03 AM, Jack Krupansky j...@basetechnology.com
 wrote:

   It sounds as if you are actually testing “vertical scalability” (load
 on a single node) rather than Cassandra’s sweet spot of “horizontal
 scalability” (add more nodes to handle higher load.) Maybe you could
 clarify your intentions and specific use case.

 Also, it sounds like you are trying to focus on large queries, but
 Cassandra’s sweet spot is lots of smaller queries. With larger queries you
 can end up measuring things like the capabilities of your hardware, cpu
 cores, memory, I/O bandwidth, network latency, JVM configuration, etc.
 rather than measuring Cassandra per se. So, again, maybe you could clarify
 your intended use case.

 It might be that you need to add more “vertical scale” (bigger box, more
 cores, more memory, beefier I/O and networking) to handle large queries, or
 maybe simple, Cassandra-style “horizontal scaling” (adding nodes) will be
 sufficient. Sure, you can tune Cassandra for single-node performance, but
 that seems lot a lot of extra work, to me, compared to adding more cheap
 nodes.

 -- Jack Krupansky

  *From:* Diane Griffith dfgriff...@gmail.com
 *Sent:* Thursday, July 17, 2014 9:31 AM
 *To:* user user@cassandra.apache.org
 *Subject:* Re: trouble showing cluster scalability for read performance

  Duncan,

 Thanks for that feedback.  I'll give a bit more info and then ask some
 more questions.

 *Our Goal*:  Not to produce the fastest read but show horizontal scaling.

  *Test procedure*:
 * Inserted 54M rows where one third of that represents a unique key, 18M
 keys.  End result given our schema is the 54M rows becomes 72M rows in the
 column family as the control query load to use.
 * have a client that queries 100k records in configurable batches, set to
 1k.  And then it does 100 reps of queries.  It doesn't do the same keys for
 each rep, it uses an offset and then it increases the keys to query.
 * We can adjust the hit rate, i.e. how many of the keys will be found but
 have been focused on 100% hit rate
 * we run the query where multiple clients can be spawned to do the same
 query cycle 100k keys but the offset is not different so each client will
 query the same keys.
 * We thought we should manually compact the tables down to 1 sstable on a
 given node for consistent results across different cluster sizes
 * We had set replication factor to 1 originally to not complicate things
 or impact initial write times even.  We would assess rf later was our
 thought.  Since we changed the keys getting queried it would have to hit
 additional nodes to get row data but for just 1 client thread (to get
 simplest path to show horizontal scaling, had a slight decrease of
 performance when going to 4 nodes

trouble showing cluster scalability for read performance

2014-07-16 Thread Diane Griffith
We have been struggling proving out linear read performance with our
cassandra configuration, that it is horizontally scaling.  Wondering if
anyone has any suggestions for what minimal configuration and approach to
use to demonstrate this.

We were trying to go for a simple set up, so on the keyspace and/or column
families we went with the following settings thinking it was the minimal to
prove scaling:

replication_factor set to 1,
SimpleStrategy,
default consistency level,
default compaction strategy (size tiered),
but compacted down to 1 sstable per cf on each node (versus using leveled
compaction for read performance)

*Read Performance Results:*
1 client thread - 2 nodes  1 node was seen but we couldn't show increased
performance adding more nodes i.e 4 nodes !  2 nodes
2 client threads - 2 nodes  1 node still was true but again we couldn't
show increased performance adding more nodes i.e. 4 nodes !  2 nodes
10 client threads - this time 2 nodes  1 node on performance numbers.  2
nodes suffered from larger reduce throughput than 1 node was showing.

Where are we going wrong?

How have others shown horizontal scaling for reads?

Thanks,
Diane