Re: horizontal query scaling issues follow on

2014-07-23 Thread Diane Griffith
I posted the query wrong, I gave the query for 1 key versus the large batch
of ids like I was testing.

What it was using for large batch was IN, so

Select * from foo where key IN  and col_name='LATEST

So after breaking it down and reading as much as I can with regard to our

- schema, dynamic wide rows (but should not equal more columns per row than
what documentation warned about)
- general configuration and recommended settings

Out of that I then read up on the anti patterns and the Select IN was
mentioned.  It sounds like it could impact the numbers.  So for our query
test pattern and simple test cluster that yes there was throughput increase
of 1 Node to 2 Nodes and potentially can explain why things decrease going
from 2 Nodes to 4 Nodes.  Does that seem the likely culprit?

Is there an alternative for batching or selecting a large key set in a
clustered environment?

Thanks,
Diane



On Fri, Jul 18, 2014 at 2:43 PM, Diane Griffith dfgriff...@gmail.com
wrote:

 Okay here are the data samples.

 Column Family Schema again:
 CREATE TABLE IF NOT EXISTS foo (key text, col_name text, col_value text,
 PRIMARY KEY(key, col_name))

 CQL Write:

 INSERT INTO foo (key, col_name,col_value) VALUES
 (“Type1:1109dccb-169b-40ef-b7f8-d072f04d8139”,”
 HISTORY:2011-04-20T09:19:13.072-0400”,

 “{key:1109dccb-169b-40ef-b7f8-d072f04d8139,keyType:
 Type1,state:state1,timestamp:1303305553072,eventId:40902,executionId:31082}”)



 CQL Read:



 SELECT col_value from foo where
 key=”Type1:1109dccb-169b-40ef-b7f8-d072f04d8139“ and col_name=”LATEST“



 Read result from above query:



 {key:1109dccb-169b-40ef-b7f8-d072f04d8139,keyType:
 Type1,state:state3,timestamp:1303446284614,eventId:7688,executionId:40847}





 CQL snippet example of select * from foo limit 8:



 Key  | col_name  |
 col_value





 Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  |
  HISTORY:2011-04-20T09:19:13.072-0400  |
 {key:1109dccb-169b-40ef-b7f8-d072f04d8139,keyType:
 Type1,state:state1,timestamp:1303305553072,eventId:40902,executionId:31082}


  Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  |
 HISTORY:2011-04-20T13:47:33.512-0400  |
{key:1109dccb-169b-40ef-b7f8-d072f04d8139,keyType:
 Type1,state:state2,timestamp:1303321653512,eventId:32660,executionId:33510}


  Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  |
 HISTORY:2011-04-22T00:24:44.614-0400  |
{key:1109dccb-169b-40ef-b7f8-d072f04d8139,keyType:
 Type1,state:state3,timestamp:1303446284614,eventId:7688,executionId:40847}


  Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  | LATEST
  | {key:1109dccb-169b-40ef-b7f8-d072f04d8139,keyType:
 Type1,state:state3,timestamp:1303446284614,eventId:7688,executionId:40847}


   Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d|
HISTORY:2010-08-26T03:45:43.366-0400   |
  {key:e876d44d-246f-40c5-b5a3-4d0eb31db00d,keyType:
 Type2,state:state1,timestamp:1282808743366,eventId:2,executionId:6214}


  Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d |
  HISTORY:2010-08-26T04:58:46.810-0400   |
   {key:e876d44d-246f-40c5-b5a3-4d0eb31db00d,keyType:
 Type2,state:state2,timestamp:1282813126810,eventId:48575,executionId:22318}


  Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d |
  HISTORY:2010-08-27T22:39:51.036-0400   |
  {key:e876d44d-246f-40c5-b5a3-4d0eb31db00d,keyType:
 Type2,state:state2,timestamp:1282963191036,eventId:21960,executionId:5067}


  Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d |LATEST|
 {key:e876d44d-246f-40c5-b5a3-4d0eb31db00d,keyType:
 Type2,state:state2,timestamp:1282963191036,eventId:21960,executionId:5067}


 For that above select * example, given how I have the primary key for the
 schema to support dynamic wide rows, it was my understanding that it really
 equates to data for 2 physical rows each with 4 cells.  So I should have 18
 million physical rows but given the number of entries I inserted for each
 key it equated to 72 million rows a select count(*) from foo will report if
 I add the limit command to let it scan all rows.

 Does anything seem like it is hurting our chances to horizontally scale
 with the data/schema?

 Thanks,
 Diane


 On Fri, Jul 18, 2014 at 6:46 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

 How many columns are you inserting/querying per key? Could we see some
 example CQL statements for the insert/read workload?

 If you are maxing out at 10 clients, something fishy is going on. In
 general, though, if you find that adding nodes causes performance to
 degrade I would suspect that you are querying data in one CQL statement
 that is spread over multiple partitions, and so extra work needs to be done
 cross-cluster to service your requests as more nodes are added.

 I would also consider what effect the file cache may be having on your
 workload, as it sounds small enough to fit in memory, so is likely a major
 determining factor for performance of your benchmark. As you try

Re: horizontal query scaling issues follow on

2014-07-21 Thread Diane Griffith
So I appreciate all the help so far.  Upfront, it is possible the schema
and data query pattern could be contributing to the problem.  The schema
was born out of certain design requirements.  If it proves to be part of
what makes the scalability crumble, then I hope it will help shape the
design requirements.

Anyway, the premise of the question was my struggle where scalability
metrics fell apart going from 2 nodes to 4 nodes for the current schema and
query access pattern being modeled:
- 1 node was producing acceptable response times seemed to be the consensus
- 2 nodes showed marked improvement to the response times for the query
scenario being modeled which was welcomed news
- 4 nodes showed a decrease in performance and it was not clear why going 2
to 4 nodes triggered the decrease

Also what contributed to the question was 2 more items:
- cassandra-env.sh - where in the example for HEAP_NEWSIZE states in the
comments it assumes a modern 8 core machine for pause times
- a wiki article I had found and I am trying to relocate where a person set
up very small nodes for developers on that team and talked through all the
paramters that had to be changed from the default to get good throughput.
 It sort of implied the defaults maybe were based on a certain sized vm.

That was the main driver for those questions. I agree it does not seem
correct to boost the values let alone so high to minimize impact in some
respects (i.e. not trigger the reads to time out and start over given the
retry policy).

So the question really was are the defaults sized with the assumption of a
certain minimal vm size (i.e. the comment in cassandra-env.sh)

Does that explain where I am coming from better?

My question, despite being naive and ignoring other impacts still stands,
is there a minimal vm size that is more of the sweet spot for cassandra and
the defaults.  I get the point that a column family schema as it relates to
the desired queries can and do impact that answer.  I guess what bothered
me was it didn't impact that answer going from 1 node to 2 nodes but
started showing up going from 2 nodes to 4 nodes.

I'm building whatever facts I can to support the schema and query pattern
scales or does not.  If it does not, then I am trying to pull information
from some metrics outputted by nodetool or log statements on the cassandra
log files to support a case to change the design requirements.

Thanks,
Diane


On Mon, Jul 21, 2014 at 8:15 PM, Robert Coli rc...@eventbrite.com wrote:

 On Sun, Jul 20, 2014 at 6:12 PM, Diane Griffith dfgriff...@gmail.com
 wrote:

 I am running tests again across different number of client threads and
 number of nodes but this time I tweaked some of the timeouts configured for
 the nodes in the cluster.  I was able to get better performance on the
 nodes at 10 client threads by upping 4 timeout values in cassandra.yaml to
 24:


 If you have to tune these timeout values, you have probably modeled data
 in such a way that each of your requests is quite large or quite slow.

 This is usually, but not always, an indicator that you are Doing It Wrong.
 Massively multithreaded things don't generally like their threads to be
 long-lived, for what should hopefully be obvious reasons.


 I did this because of my interpretation of the cfhistograms output on one
 of the nodes.


 Could you be more specific?


 So 3 questions that come to mind:


1. Did I interpret the histogram information correctly in cassandra
2.0.6 nodetool output?  That the 2 column read latency output is the 
 offset
or left column is the time in milliseconds and the right column is number
of requests that fell into that bucket range.
2. Was it reasonable for me to boost those 4 timeouts and just those?

 Not really. In 5 years of operating Cassandra, I've never had a problem
 whose solution was to increase these timeouts from their default.


1. What are reasonable timeout values for smaller vm sizes (i.e. 8GB
RAM, 4 CPUs)?

 As above, I question the premise of this question.

 =Rob




Re: horizontal query scaling issues follow on

2014-07-20 Thread Diane Griffith
I am running tests again across different number of client threads and
number of nodes but this time I tweaked some of the timeouts configured for
the nodes in the cluster.  I was able to get better performance on the
nodes at 10 client threads by upping 4 timeout values in cassandra.yaml to
24:


   - read_request_timeout_in_ms
   - range_request_timeout_in_ms
   - write_request_timeout_in_ms
   - request_timeout_in_ms


I did this because of my interpretation of the cfhistograms output on one
of the nodes.

So 3 questions that come to mind:


   1. Did I interpret the histogram information correctly in cassandra
   2.0.6 nodetool output?  That the 2 column read latency output is the offset
   or left column is the time in milliseconds and the right column is number
   of requests that fell into that bucket range.
   2. Was it reasonable for me to boost those 4 timeouts and just those?
   3. What are reasonable timeout values for smaller vm sizes (i.e. 8GB
   RAM, 4 CPUs)?

If anyone has any  insight it would be appreciated.

Thanks,
Diane


On Fri, Jul 18, 2014 at 2:23 PM, Tyler Hobbs ty...@datastax.com wrote:


 On Fri, Jul 18, 2014 at 8:01 AM, Diane Griffith dfgriff...@gmail.com
 wrote:


 Partition Size (bytes)
 1109 bytes: 1800

 Cell Count per Partition
 8 cells: 1800

 meaning I can't glean anything about how it partitioned or if it broke a
 key across partitions from this right?  Does it mean for 1800 (the
 number of unique keys) that each has 8 cells?


 Yes, your interpretation is correct.  Each of your 1800 partitions has
 8 cells (taking up 1109 bytes).


 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: horizontal query scaling issues follow on

2014-07-18 Thread Diane Griffith
The column family schema is:

CREATE TABLE IF NOT EXISTS foo (key text, col_name text, col_value text,
PRIMARY KEY(key, col_name))

where the key is a generated uuid and all keys were inserted in random
order but in the end we were compacting down to one sstable per node.

So we were doing it this way to achieve dynamic columns.

Thanks,
Diane

On Fri, Jul 18, 2014 at 12:19 AM, Jack Krupansky j...@basetechnology.com
wrote:

   Sorry I may have confused the discussion by mentioning tokens – I
 wasn’t intending to refer to vnodes or the num_tokens property, but merely
 referring to the token range of a node and that the partition key hashes to
 a token value.

 The main question is what you use for your primary key and whether you are
 using a small number of partition keys and a large number of clustering
 columns, or does each row have a unique partition key and no clustering
 columns.

 -- Jack Krupansky

  *From:* Diane Griffith dfgriff...@gmail.com
 *Sent:* Thursday, July 17, 2014 6:21 PM
 *To:* user user@cassandra.apache.org
 *Subject:* Re: horizontal query scaling issues follow on

  So do partitions equate to tokens/vnodes?

 If so we had configured all cluster nodes/vms with num_tokens: 256 instead
 of setting init_token and assigning ranges.  I am still not getting why in
 Cassandra 2.0, I would assign my own ranges via init_token and this was
 based on the documentation and even this blog item
 http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 that
 made it seem right for us to always configure our cluster vms with
 num_tokens: 256 in the cassandra.yaml file.

 Also in all testing, all vms were of equal sizing so one was not more
 powerful than another.

 I didn't think I was hitting an i/o wall on the client vm (separate vm)
 where we command line scripted our query call to the cassandra cluster.
 I can break the client call load across vms which I tried early on.  Happy
 to verify that again though.

 So given that I was assuming the partitions were such that it wasn't a
 problem.  Is that an incorrect assumption and something to dig into more?

 Thanks,
 Diane


 On Thu, Jul 17, 2014 at 3:01 PM, Jack Krupansky j...@basetechnology.com
 wrote:

   How many partitions are you spreading those 18 million rows over? That
 many rows in a single partition will not be a sweet spot for Cassandra.
 It’s not exceeding any hard limit (2 billion), but some internal operations
 may cache the partition rather than the logical row.

 And all those rows in a single partition would certainly not be a test of
 “horizontal scaling” (adding nodes to handle more data – more token values
 or partitions.)

 -- Jack Krupansky

  *From:* Diane Griffith dfgriff...@gmail.com
 *Sent:* Thursday, July 17, 2014 1:33 PM
 *To:* user user@cassandra.apache.org
 *Subject:* horizontal query scaling issues follow on


 This is a follow on re-post to clarify what we are trying to do,
 providing information that was missing or not clear.



 Goal:  Verify horizontal scaling for random non duplicating key reads
 using the simplest configuration (or minimal configuration) possible.



 Background:

 A couple years ago we did similar performance testing with Cassandra for
 both read and write performance and found excellent (essentially linear)
 horizontal scalability.  That project got put on hold.  We are now moving
 forward with an operational system and are having scaling problems.



 During the prior testing (3 years ago) we were using a much older version
 of Cassandra (0.8 or older), the THRIFT API, and Amazon AWS rather than
 OpenStack VMs.  We are now using the latest Cassandra and the CQL
 interface.  We did try moving from OpenStack to AWS/EC2 but that did not
 materially change our (poor) results.



 Test Procedure:

- Inserted 54 million cells in 18 million rows (so 3 cells per row),
using randomly generated row keys. That was to be our data control for the
test.
- Spawn a client on a different VM to query 100k rows and do that for
100 reps.  Each row key queried is drawn randomly from the set of existing
row keys, and then not re-used, so all 10 million row queries use a
different (valid) row key.  This test is a specific use case of our system
we are trying to show will scale

 Result:

- 2 nodes performed better than 1 node test but 4 nodes showed
decreased performance over 2 nodes.  So that did not show horizontal 
 scaling



 Notes:

- We have replication factor set to 1 as we were trying to keep the
control test simple to prove out horizontal scaling.
- When we tried to add threading to see if it would help it had
interesting side behavior which did not prove out horizontal scaling.
- We are using CQL versus THRIFT API for Cassandra 2.0.6





 Does anyone have any feedback that either threading or replication factor
 is necessary to show horizontal scaling of Cassandra versus the minimal way
 of just continue to add nodes to help throughput

Re: horizontal query scaling issues follow on

2014-07-18 Thread Diane Griffith
Working on getting some samples but grabbed the last part of the nodetool
cfhistograms for one of the column families on one of the nodes.  What does
it mean for the partition information:

Partition Size (bytes)
1109 bytes: 1800

Cell Count per Partition
8 cells: 1800

meaning I can't glean anything about how it partitioned or if it broke a
key across partitions from this right?  Does it mean for 1800 (the
number of unique keys) that each has 8 cells?

Thanks,
Diane


On Fri, Jul 18, 2014 at 6:46 AM, Benedict Elliott Smith 
belliottsm...@datastax.com wrote:

 How many columns are you inserting/querying per key? Could we see some
 example CQL statements for the insert/read workload?

 If you are maxing out at 10 clients, something fishy is going on. In
 general, though, if you find that adding nodes causes performance to
 degrade I would suspect that you are querying data in one CQL statement
 that is spread over multiple partitions, and so extra work needs to be done
 cross-cluster to service your requests as more nodes are added.

 I would also consider what effect the file cache may be having on your
 workload, as it sounds small enough to fit in memory, so is likely a major
 determining factor for performance of your benchmark. As you try different
 client levels for the smaller cluster you may see improved performance as
 the data is pulled into file cache across test runs, and then when you
 build your larger cluster this is lost so performance appears to degrade
 (for instance).


 On Fri, Jul 18, 2014 at 12:25 PM, Diane Griffith dfgriff...@gmail.com
 wrote:

 The column family schema is:

 CREATE TABLE IF NOT EXISTS foo (key text, col_name text, col_value text,
 PRIMARY KEY(key, col_name))

 where the key is a generated uuid and all keys were inserted in random
 order but in the end we were compacting down to one sstable per node.

 So we were doing it this way to achieve dynamic columns.

 Thanks,
 Diane

 On Fri, Jul 18, 2014 at 12:19 AM, Jack Krupansky j...@basetechnology.com
  wrote:

   Sorry I may have confused the discussion by mentioning tokens – I
 wasn’t intending to refer to vnodes or the num_tokens property, but merely
 referring to the token range of a node and that the partition key hashes to
 a token value.

 The main question is what you use for your primary key and whether you
 are using a small number of partition keys and a large number of clustering
 columns, or does each row have a unique partition key and no clustering
 columns.

 -- Jack Krupansky

  *From:* Diane Griffith dfgriff...@gmail.com
 *Sent:* Thursday, July 17, 2014 6:21 PM
 *To:* user user@cassandra.apache.org
 *Subject:* Re: horizontal query scaling issues follow on

  So do partitions equate to tokens/vnodes?

 If so we had configured all cluster nodes/vms with num_tokens: 256
 instead of setting init_token and assigning ranges.  I am still not getting
 why in Cassandra 2.0, I would assign my own ranges via init_token and this
 was based on the documentation and even this blog item
 http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 that
 made it seem right for us to always configure our cluster vms with
 num_tokens: 256 in the cassandra.yaml file.

 Also in all testing, all vms were of equal sizing so one was not more
 powerful than another.

 I didn't think I was hitting an i/o wall on the client vm (separate vm)
 where we command line scripted our query call to the cassandra cluster.
 I can break the client call load across vms which I tried early on.  Happy
 to verify that again though.

 So given that I was assuming the partitions were such that it wasn't a
 problem.  Is that an incorrect assumption and something to dig into more?

 Thanks,
 Diane


 On Thu, Jul 17, 2014 at 3:01 PM, Jack Krupansky j...@basetechnology.com
  wrote:

   How many partitions are you spreading those 18 million rows over?
 That many rows in a single partition will not be a sweet spot for
 Cassandra. It’s not exceeding any hard limit (2 billion), but some internal
 operations may cache the partition rather than the logical row.

 And all those rows in a single partition would certainly not be a test
 of “horizontal scaling” (adding nodes to handle more data – more token
 values or partitions.)

 -- Jack Krupansky

  *From:* Diane Griffith dfgriff...@gmail.com
 *Sent:* Thursday, July 17, 2014 1:33 PM
 *To:* user user@cassandra.apache.org
 *Subject:* horizontal query scaling issues follow on


 This is a follow on re-post to clarify what we are trying to do,
 providing information that was missing or not clear.



 Goal:  Verify horizontal scaling for random non duplicating key reads
 using the simplest configuration (or minimal configuration) possible.



 Background:

 A couple years ago we did similar performance testing with Cassandra
 for both read and write performance and found excellent (essentially
 linear) horizontal scalability.  That project got put on hold.  We are now
 moving forward

Re: horizontal query scaling issues follow on

2014-07-18 Thread Diane Griffith
Okay here are the data samples.

Column Family Schema again:
CREATE TABLE IF NOT EXISTS foo (key text, col_name text, col_value text,
PRIMARY KEY(key, col_name))

CQL Write:

INSERT INTO foo (key, col_name,col_value) VALUES
(“Type1:1109dccb-169b-40ef-b7f8-d072f04d8139”,”
HISTORY:2011-04-20T09:19:13.072-0400”,

“{key:1109dccb-169b-40ef-b7f8-d072f04d8139,keyType:
Type1,state:state1,timestamp:1303305553072,eventId:40902,executionId:31082}”)



CQL Read:



SELECT col_value from foo where
key=”Type1:1109dccb-169b-40ef-b7f8-d072f04d8139“ and col_name=”LATEST“



Read result from above query:



{key:1109dccb-169b-40ef-b7f8-d072f04d8139,keyType:
Type1,state:state3,timestamp:1303446284614,eventId:7688,executionId:40847}





CQL snippet example of select * from foo limit 8:



Key  | col_name  |
col_value





Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  |
 HISTORY:2011-04-20T09:19:13.072-0400  |
{key:1109dccb-169b-40ef-b7f8-d072f04d8139,keyType:
Type1,state:state1,timestamp:1303305553072,eventId:40902,executionId:31082}


 Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  |
HISTORY:2011-04-20T13:47:33.512-0400  |
   {key:1109dccb-169b-40ef-b7f8-d072f04d8139,keyType:
Type1,state:state2,timestamp:1303321653512,eventId:32660,executionId:33510}


 Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  |
HISTORY:2011-04-22T00:24:44.614-0400  |
   {key:1109dccb-169b-40ef-b7f8-d072f04d8139,keyType:
Type1,state:state3,timestamp:1303446284614,eventId:7688,executionId:40847}


 Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  | LATEST |
{key:1109dccb-169b-40ef-b7f8-d072f04d8139,keyType:
Type1,state:state3,timestamp:1303446284614,eventId:7688,executionId:40847}


  Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d|
   HISTORY:2010-08-26T03:45:43.366-0400   |
 {key:e876d44d-246f-40c5-b5a3-4d0eb31db00d,keyType:
Type2,state:state1,timestamp:1282808743366,eventId:2,executionId:6214}


 Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d |
 HISTORY:2010-08-26T04:58:46.810-0400   |
  {key:e876d44d-246f-40c5-b5a3-4d0eb31db00d,keyType:
Type2,state:state2,timestamp:1282813126810,eventId:48575,executionId:22318}


 Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d |
 HISTORY:2010-08-27T22:39:51.036-0400   |
 {key:e876d44d-246f-40c5-b5a3-4d0eb31db00d,keyType:
Type2,state:state2,timestamp:1282963191036,eventId:21960,executionId:5067}


 Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d |LATEST|
{key:e876d44d-246f-40c5-b5a3-4d0eb31db00d,keyType:
Type2,state:state2,timestamp:1282963191036,eventId:21960,executionId:5067}


For that above select * example, given how I have the primary key for the
schema to support dynamic wide rows, it was my understanding that it really
equates to data for 2 physical rows each with 4 cells.  So I should have 18
million physical rows but given the number of entries I inserted for each
key it equated to 72 million rows a select count(*) from foo will report if
I add the limit command to let it scan all rows.

Does anything seem like it is hurting our chances to horizontally scale
with the data/schema?

Thanks,
Diane


On Fri, Jul 18, 2014 at 6:46 AM, Benedict Elliott Smith 
belliottsm...@datastax.com wrote:

 How many columns are you inserting/querying per key? Could we see some
 example CQL statements for the insert/read workload?

 If you are maxing out at 10 clients, something fishy is going on. In
 general, though, if you find that adding nodes causes performance to
 degrade I would suspect that you are querying data in one CQL statement
 that is spread over multiple partitions, and so extra work needs to be done
 cross-cluster to service your requests as more nodes are added.

 I would also consider what effect the file cache may be having on your
 workload, as it sounds small enough to fit in memory, so is likely a major
 determining factor for performance of your benchmark. As you try different
 client levels for the smaller cluster you may see improved performance as
 the data is pulled into file cache across test runs, and then when you
 build your larger cluster this is lost so performance appears to degrade
 (for instance).


 On Fri, Jul 18, 2014 at 12:25 PM, Diane Griffith dfgriff...@gmail.com
 wrote:

 The column family schema is:

 CREATE TABLE IF NOT EXISTS foo (key text, col_name text, col_value text,
 PRIMARY KEY(key, col_name))

 where the key is a generated uuid and all keys were inserted in random
 order but in the end we were compacting down to one sstable per node.

 So we were doing it this way to achieve dynamic columns.

 Thanks,
 Diane

 On Fri, Jul 18, 2014 at 12:19 AM, Jack Krupansky j...@basetechnology.com
  wrote:

   Sorry I may have confused the discussion by mentioning tokens – I
 wasn’t intending to refer to vnodes or the num_tokens property, but merely
 referring to the token range of a node and that the partition key hashes to
 a token value.

 The main question is what you use for your

Re: trouble showing cluster scalability for read performance

2014-07-17 Thread Diane Griffith
Duncan,

Thanks for that feedback.  I'll give a bit more info and then ask some more
questions.

*Our Goal*:  Not to produce the fastest read but show horizontal scaling.

*Test procedure*:
* Inserted 54M rows where one third of that represents a unique key, 18M
keys.  End result given our schema is the 54M rows becomes 72M rows in the
column family as the control query load to use.
* have a client that queries 100k records in configurable batches, set to
1k.  And then it does 100 reps of queries.  It doesn't do the same keys for
each rep, it uses an offset and then it increases the keys to query.
* We can adjust the hit rate, i.e. how many of the keys will be found but
have been focused on 100% hit rate
* we run the query where multiple clients can be spawned to do the same
query cycle 100k keys but the offset is not different so each client will
query the same keys.
* We thought we should manually compact the tables down to 1 sstable on a
given node for consistent results across different cluster sizes
* We had set replication factor to 1 originally to not complicate things or
impact initial write times even.  We would assess rf later was our thought.
 Since we changed the keys getting queried it would have to hit additional
nodes to get row data but for just 1 client thread (to get simplest path to
show horizontal scaling, had a slight decrease of performance when going to
4 nodes from 2 nodes)

Things seen off of given procedure and set up:


   1. 1 client thread:  2 nodes do better than 1 node on the query test.
But 4 nodes did not do better than 2.
   2. 2 client threads: 2 nodes were still doing better than 1 node
   3. 10 client threads: the times drastically suffered and 2 nodes were
   doing 1/2 the speed of 1 node but before 1 to 2 threads performed better on
   2 nodes vs 1 node.  There was a huge decrease in performance on 2 nodes and
   just a mild decrease on 1 node.

Note: 50+ threads was also drastically falling apart.

*Observations*:

   - compacting each node to 1 table did not seem to help as running 10
   client threads on exploded sstables and 2 nodes was 2x better than the last
   2 node 10 client test but still decreased performance from 1 to 2 threads
   query against compacted tables
   - I would see upwards to 10 read requests pending at times while 8 to 10
   were processing when I did nodetool tpstats.
   - having key cache on or disabled did not seem to impact things
   noticeably with our current configuration

.

*Questions:*

   1. can multiple threads read the same sstable at the same time?  Does
   compacting down to 1 sstable (to get a given row into one sstable) add any
   benefit or actually hurt like limited testing has indicated currently?
   2. given the above testing process, does it still make sense to adjust
   replication factor appropriately for cluster size (i.e. 1 for 1 node
   cluster, 2 for 2 node cluster, 3 for n size cluster).  We assumed it was
   just the ability for threads to connect into a coordinator that would help
   but sounds like it can still block


I'm going to try a limited test with changing replication factor.  But if
anyone has any input on compacting to 1 sstable benefit or detriment on
just simple scalability test, how if at all does cassandra block on reading
sstables, and if higher replication factors do indeed help produce reliable
results it would be appreciated.  I know part of our charter was keep it
simple to produce the scalability proof but it does sound like replication
factor is hurting us if the delay between clients for the same keys is not
long enough given the fact we are not doing different offsets for each
client thread.

Thanks,
Diane

On Thu, Jul 17, 2014 at 3:53 AM, Duncan Sands duncan.sa...@gmail.com
wrote:

 Hi Diane,


 On 17/07/14 06:19, Diane Griffith wrote:

 We have been struggling proving out linear read performance with our
 cassandra
 configuration, that it is horizontally scaling.  Wondering if anyone has
 any
 suggestions for what minimal configuration and approach to use to
 demonstrate this.

 We were trying to go for a simple set up, so on the keyspace and/or column
 families we went with the following settings thinking it was the minimal
 to
 prove scaling:

 replication_factor set to 1,


 a RF of 1 means that any particular bit of data exists on exactly one
 node.  So if you are testing read speed by reading the same data item again
 and again as fast as you can, then all the reads will be coming from the
 same one node, the one that has that data item on it.  In this situation
 adding more nodes won't help.  Maybe this isn't exactly how you are testing
 read speed, but perhaps you are doing something analogous?  I suggest you
 explain how you are measuring read speed exactly.

 Ciao, Duncan.

  SimpleStrategy,
 default consistency level,
 default compaction strategy (size tiered),
 but compacted down to 1 sstable per cf on each node (versus using leveled
 compaction for read performance

horizontal query scaling issues follow on

2014-07-17 Thread Diane Griffith
This is a follow on re-post to clarify what we are trying to do, providing
information that was missing or not clear.



Goal:  Verify horizontal scaling for random non duplicating key reads using
the simplest configuration (or minimal configuration) possible.



Background:

A couple years ago we did similar performance testing with Cassandra for
both read and write performance and found excellent (essentially linear)
horizontal scalability.  That project got put on hold.  We are now moving
forward with an operational system and are having scaling problems.



During the prior testing (3 years ago) we were using a much older version
of Cassandra (0.8 or older), the THRIFT API, and Amazon AWS rather than
OpenStack VMs.  We are now using the latest Cassandra and the CQL
interface.  We did try moving from OpenStack to AWS/EC2 but that did not
materially change our (poor) results.



Test Procedure:

   - Inserted 54 million cells in 18 million rows (so 3 cells per row),
   using randomly generated row keys. That was to be our data control for the
   test.
   - Spawn a client on a different VM to query 100k rows and do that for
   100 reps.  Each row key queried is drawn randomly from the set of existing
   row keys, and then not re-used, so all 10 million row queries use a
   different (valid) row key.  This test is a specific use case of our system
   we are trying to show will scale

Result:

   - 2 nodes performed better than 1 node test but 4 nodes showed decreased
   performance over 2 nodes.  So that did not show horizontal scaling



Notes:

   - We have replication factor set to 1 as we were trying to keep the
   control test simple to prove out horizontal scaling.
   - When we tried to add threading to see if it would help it had
   interesting side behavior which did not prove out horizontal scaling.
   - We are using CQL versus THRIFT API for Cassandra 2.0.6





Does anyone have any feedback that either threading or replication factor
is necessary to show horizontal scaling of Cassandra versus the minimal way
of just continue to add nodes to help throughput?



Any suggestions of minimal configuration necessary to show scaling of our
query use case 100k requests for random non repeating keys constantly
coming in over a period of time?


Thanks,

Diane


Re: trouble showing cluster scalability for read performance

2014-07-17 Thread Diane Griffith
Definitely not trying to show vertical scaling.  We have a query use case
we are trying to show will scale as we add more nodes should performance
fall below adequate.   But to show the scaling we do the test on a 1 node
cluster, then 2 node cluster, then 4 node cluster with a goal that query
throughput increases when adding more nodes.

Basically we do not want to tune for single node performance and did want
to prove out adding nodes works but for our query use case it hasn't yet.
 Our query size is a valid use case though for our need.

Earlier it may not have been clear but we are not querying the same key
over and over in one thread but continuously querying random non
duplicating keys.  Bringing up the threading was not our main path or
desired goal so I re-posted with clearer intent hopefully of our goal, what
we experienced in the past against THRIFT and an older version of Cassandra
which we have not been able to duplicate via CQL and Cassandra 2.0.6.

So just hoping someone has suggestions of what one must do at a minimum to
prove horizontal scaling or have suggestions of what to look at in our
current datasize/query use case that may be causing us to not achieve
horizontal scaling.

Thanks,
Diane




On Thu, Jul 17, 2014 at 10:03 AM, Jack Krupansky j...@basetechnology.com
wrote:

   It sounds as if you are actually testing “vertical scalability” (load
 on a single node) rather than Cassandra’s sweet spot of “horizontal
 scalability” (add more nodes to handle higher load.) Maybe you could
 clarify your intentions and specific use case.

 Also, it sounds like you are trying to focus on large queries, but
 Cassandra’s sweet spot is lots of smaller queries. With larger queries you
 can end up measuring things like the capabilities of your hardware, cpu
 cores, memory, I/O bandwidth, network latency, JVM configuration, etc.
 rather than measuring Cassandra per se. So, again, maybe you could clarify
 your intended use case.

 It might be that you need to add more “vertical scale” (bigger box, more
 cores, more memory, beefier I/O and networking) to handle large queries, or
 maybe simple, Cassandra-style “horizontal scaling” (adding nodes) will be
 sufficient. Sure, you can tune Cassandra for single-node performance, but
 that seems lot a lot of extra work, to me, compared to adding more cheap
 nodes.

 -- Jack Krupansky

  *From:* Diane Griffith dfgriff...@gmail.com
 *Sent:* Thursday, July 17, 2014 9:31 AM
 *To:* user user@cassandra.apache.org
 *Subject:* Re: trouble showing cluster scalability for read performance

  Duncan,

 Thanks for that feedback.  I'll give a bit more info and then ask some
 more questions.

 *Our Goal*:  Not to produce the fastest read but show horizontal scaling.

  *Test procedure*:
 * Inserted 54M rows where one third of that represents a unique key, 18M
 keys.  End result given our schema is the 54M rows becomes 72M rows in the
 column family as the control query load to use.
 * have a client that queries 100k records in configurable batches, set to
 1k.  And then it does 100 reps of queries.  It doesn't do the same keys for
 each rep, it uses an offset and then it increases the keys to query.
 * We can adjust the hit rate, i.e. how many of the keys will be found but
 have been focused on 100% hit rate
 * we run the query where multiple clients can be spawned to do the same
 query cycle 100k keys but the offset is not different so each client will
 query the same keys.
 * We thought we should manually compact the tables down to 1 sstable on a
 given node for consistent results across different cluster sizes
 * We had set replication factor to 1 originally to not complicate things
 or impact initial write times even.  We would assess rf later was our
 thought.  Since we changed the keys getting queried it would have to hit
 additional nodes to get row data but for just 1 client thread (to get
 simplest path to show horizontal scaling, had a slight decrease of
 performance when going to 4 nodes from 2 nodes)

 Things seen off of given procedure and set up:


1. 1 client thread:  2 nodes do better than 1 node on the query test.
But 4 nodes did not do better than 2.
2. 2 client threads: 2 nodes were still doing better than 1 node
3. 10 client threads: the times drastically suffered and 2 nodes were
doing 1/2 the speed of 1 node but before 1 to 2 threads performed better on
2 nodes vs 1 node.  There was a huge decrease in performance on 2 nodes and
just a mild decrease on 1 node.

 Note: 50+ threads was also drastically falling apart.

 *Observations*:

- compacting each node to 1 table did not seem to help as running 10
client threads on exploded sstables and 2 nodes was 2x better than the last
2 node 10 client test but still decreased performance from 1 to 2 threads
query against compacted tables
- I would see upwards to 10 read requests pending at times while 8 to
10 were processing when I did nodetool tpstats

Re: horizontal query scaling issues follow on

2014-07-17 Thread Diane Griffith
So do partitions equate to tokens/vnodes?

If so we had configured all cluster nodes/vms with num_tokens: 256 instead
of setting init_token and assigning ranges.  I am still not getting why in
Cassandra 2.0, I would assign my own ranges via init_token and this was
based on the documentation and even this blog item
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 that made
it seem right for us to always configure our cluster vms with num_tokens:
256 in the cassandra.yaml file.

Also in all testing, all vms were of equal sizing so one was not more
powerful than another.

I didn't think I was hitting an i/o wall on the client vm (separate vm)
where we command line scripted our query call to the cassandra cluster.
 I can break the client call load across vms which I tried early on.  Happy
to verify that again though.

So given that I was assuming the partitions were such that it wasn't a
problem.  Is that an incorrect assumption and something to dig into more?

Thanks,
Diane


On Thu, Jul 17, 2014 at 3:01 PM, Jack Krupansky j...@basetechnology.com
wrote:

   How many partitions are you spreading those 18 million rows over? That
 many rows in a single partition will not be a sweet spot for Cassandra.
 It’s not exceeding any hard limit (2 billion), but some internal operations
 may cache the partition rather than the logical row.

 And all those rows in a single partition would certainly not be a test of
 “horizontal scaling” (adding nodes to handle more data – more token values
 or partitions.)

 -- Jack Krupansky

  *From:* Diane Griffith dfgriff...@gmail.com
 *Sent:* Thursday, July 17, 2014 1:33 PM
 *To:* user user@cassandra.apache.org
 *Subject:* horizontal query scaling issues follow on


 This is a follow on re-post to clarify what we are trying to do, providing
 information that was missing or not clear.



 Goal:  Verify horizontal scaling for random non duplicating key reads
 using the simplest configuration (or minimal configuration) possible.



 Background:

 A couple years ago we did similar performance testing with Cassandra for
 both read and write performance and found excellent (essentially linear)
 horizontal scalability.  That project got put on hold.  We are now moving
 forward with an operational system and are having scaling problems.



 During the prior testing (3 years ago) we were using a much older version
 of Cassandra (0.8 or older), the THRIFT API, and Amazon AWS rather than
 OpenStack VMs.  We are now using the latest Cassandra and the CQL
 interface.  We did try moving from OpenStack to AWS/EC2 but that did not
 materially change our (poor) results.



 Test Procedure:

- Inserted 54 million cells in 18 million rows (so 3 cells per row),
using randomly generated row keys. That was to be our data control for the
test.
- Spawn a client on a different VM to query 100k rows and do that for
100 reps.  Each row key queried is drawn randomly from the set of existing
row keys, and then not re-used, so all 10 million row queries use a
different (valid) row key.  This test is a specific use case of our system
we are trying to show will scale

 Result:

- 2 nodes performed better than 1 node test but 4 nodes showed
decreased performance over 2 nodes.  So that did not show horizontal 
 scaling



 Notes:

- We have replication factor set to 1 as we were trying to keep the
control test simple to prove out horizontal scaling.
- When we tried to add threading to see if it would help it had
interesting side behavior which did not prove out horizontal scaling.
- We are using CQL versus THRIFT API for Cassandra 2.0.6





 Does anyone have any feedback that either threading or replication factor
 is necessary to show horizontal scaling of Cassandra versus the minimal way
 of just continue to add nodes to help throughput?



 Any suggestions of minimal configuration necessary to show scaling of our
 query use case 100k requests for random non repeating keys constantly
 coming in over a period of time?


 Thanks,

 Diane



Re: horizontal query scaling issues follow on

2014-07-17 Thread Diane Griffith
So I stripped out the number of clients experiment path information.  It is
unclear if I can only show horizontal scaling by also spawning many client
requests all working at once.  So that is why I stripped that information
out to distill what our original attempt was at how to show horizontal
scaling.

I did tests comparing 1, 2, 10, 20, 50, 100 clients spawned all querying.
 Performance on 2 nodes starts to degrade from 10 clients on.  I saw
similar behavior on 4 nodes but haven't done the official runs on that yet.


When I tried to grab the list of tokens assigned and populate it in the
cassandra.yaml I never got it right.

I basically did the command and it was outputting 256 tokens on each node
and comma separated.  So I tried taking that string and setting that as the
value to initial_token but the node wouldn't start up.

Not sure if I maybe had a carriage return in there and that was the problem.

And if I do that do I need to do more than comment out num_tokens?

Thanks,
Diane




On Thu, Jul 17, 2014 at 6:58 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Jul 17, 2014 at 3:21 PM, Diane Griffith dfgriff...@gmail.com
 wrote:

 So do partitions equate to tokens/vnodes?


 A partition is what used to be called a row.

 Each individual token in the token ring can contain a partition, which you
 request using the token as the key.

 A token range is the space between two tokens.


 If so we had configured all cluster nodes/vms with num_tokens: 256
 instead of setting init_token and assigning ranges.  I am still not getting
 why in Cassandra 2.0, I would assign my own ranges via init_token and this
 was based on the documentation and even this blog item
 http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 that
 made it seem right for us to always configure our cluster vms with
 num_tokens: 256 in the cassandra.yaml file.


 If you are using vnodes and don't want to try to figure out what ideally
 random token ranges for them are, you should, generally :

 1) start the node with num_tokens set to a value greater than 1
 2) once succesffully bootstrapped, dump all node tokens with :

 nodetool info -T | grep Token | awk '{print $3}' | paste -s -d,

 3) put list from 2) in initial_token list in cassandra.yaml
 4) (optional) restart and verify that your node has the tokens you expect

 So given that I was assuming the partitions were such that it wasn't a
 problem.  Is that an incorrect assumption and something to dig into more?


 How many client threads do you have? Your OP suggested a low number, which
 will not have good results in terms of throughput?

 =Rob




Re: adding more nodes into the cluster

2014-07-16 Thread Diane Griffith
Being a newbie, can you point out where in the documentation it talks of
waiting 2 minutes between starts of each node?

I ask this because I had looked at what was documented for clustering and
even for backup and restore and did not feel I saw anything that mentioned
this.  Remember I also posted at how the restore did not seem to be working
as expected (where I had num_tokens: 256 versus specifying tokens).  Even
if I tried to wait 2 minutes before restoring each node, it seemed I had to
give all nodes all the sstable files after we stood up a new cluster.  I
figured it had to have been due to the token mismatch and giving each node
all the sstables and cleaning them all up allowed each to get the data in
its token range.

So I just wanted to see where this concept is covered beyond the jira you
just referenced that I can brush up on.

Thanks,
Diane


On Wed, Jul 16, 2014 at 2:20 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 16, 2014 at 9:16 AM, Parag Patel ppa...@clearpoolgroup.com
 wrote:

  We have a 12 node cluster with replication factor of 3 in 1
 datacenter.  We want to add 6 more nodes into the cluster.  I’m trying to
 see what’s better bootstapping all 6 at the same time or doing it one node
 at a time.


 I should really write a blog post on this.

 For safety, operators should generally bootstrap one node at a time. There
 are rare cases in non-vnode operation where one can safely bootstrap more
 than one node, but in general one should not do so.

 In the future in Cassandra, you will hopefully prohibited from
 bootstrapping more than one at a time, because it's a natural thing to do
 and Bad Stuff Can Happen.

 https://issues.apache.org/jira/browse/CASSANDRA-7069

 =Rob



Re: adding more nodes into the cluster

2014-07-16 Thread Diane Griffith
So for specifically adding nodes to a cluster then?

Under initializing a cluster (so bringing up a full cluster either in a
single data center or multiple data centers) it didn't talk of 2 minutes
between nodes.  That is what I'm trying to figure out when the 2 minute
rule applies versus not.

Thanks,
Diane


On Wed, Jul 16, 2014 at 3:02 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 16, 2014 at 11:31 AM, Diane Griffith dfgriff...@gmail.com
 wrote:

 Being a newbie, can you point out where in the documentation it talks of
 waiting 2 minutes between starts of each node?



 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
 
 Start Cassandra on each new node. Allow two minutes between node
 initializations. You can monitor the startup and data streaming process
 using nodetool netstats.
 

 I'm not sure where in Apache Cassandra documentation that might appear.

 =Rob




trouble showing cluster scalability for read performance

2014-07-16 Thread Diane Griffith
We have been struggling proving out linear read performance with our
cassandra configuration, that it is horizontally scaling.  Wondering if
anyone has any suggestions for what minimal configuration and approach to
use to demonstrate this.

We were trying to go for a simple set up, so on the keyspace and/or column
families we went with the following settings thinking it was the minimal to
prove scaling:

replication_factor set to 1,
SimpleStrategy,
default consistency level,
default compaction strategy (size tiered),
but compacted down to 1 sstable per cf on each node (versus using leveled
compaction for read performance)

*Read Performance Results:*
1 client thread - 2 nodes  1 node was seen but we couldn't show increased
performance adding more nodes i.e 4 nodes !  2 nodes
2 client threads - 2 nodes  1 node still was true but again we couldn't
show increased performance adding more nodes i.e. 4 nodes !  2 nodes
10 client threads - this time 2 nodes  1 node on performance numbers.  2
nodes suffered from larger reduce throughput than 1 node was showing.

Where are we going wrong?

How have others shown horizontal scaling for reads?

Thanks,
Diane


restore a cassandra cluster from snapshot failed

2014-07-09 Thread Diane Griffith
Hope someone can help. We are having issues restoring all nodes of a
cassandra 2.0 cluster from a snapshot. I have reviewed the instructions
[Restoring from a snapshot][1]

Specific steps done include:

   1. All data had been flushed from the memtables.
   2. All nodes were compacted down to 1 sstable
   3. Snapshots were taken on all nodes and saved off elsewhere
   4. New cluster stood up, install from sratch of identical cluster (less
   data)
   5. keyspace and column families were created
   6. All nodes were stopped
   7. commitlogs were cleared on all nodes and verified no sstable files
   existed
   8. snapshot sstables were copied to each corresponding node under the
   base table folder
   9. All nodes were restarted
   10. Nodetool repair was run on all nodes

Result of these steps that appear to match the documentation is:

   -

   For a 2 node cluster, nodetool cfstats on each node seems to report
   approximate number of keys each node would have. nodetool status shows
   correct division of data by host
   -

   logging into cqlsh and doing a select count(*) on one of the
   columnfamilies with limit high enough to return all rows does not report
   back the correct/original number of rows. It appears to report just the
   results of one node.

Is there a step missing from the documentation? Why doesn't a select
count(*) show all the rows?

Thanks,

Diane


Re: restore a cassandra cluster from snapshot failed

2014-07-09 Thread Diane Griffith
Yes the link was to the documentation:

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html

So when you say restore the system column family, do you mean that
keyspace?  Or do you mean the desired target column family.  Via cql, the
target keyspace and column families to be restored are created.

The only files copied over were the sstable files for each applicable
column family.

in cassandra.yaml, the num_tokens parameter is set and set to 256.
initial_token is not set at all and there are no vnode configurations set
either.  The seed list of the cluster is the same as before, everything is
identical as the first time.

Is there a problem that happens if only num_tokens is set?  I did not
remember seeing anything that I needed to set initial_token on clusters in
2.0.

Thanks,
Diane










On Wed, Jul 9, 2014 at 6:54 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 9, 2014 at 3:36 PM, Diane Griffith dfgriff...@gmail.com
 wrote:

 Hope someone can help. We are having issues restoring all nodes of a
 cassandra 2.0 cluster from a snapshot. I have reviewed the instructions
 [Restoring from a snapshot][1]

 Was there supposed to be a link here?

 Briefly, did you restore the system column family? Are you using vnodes?
  Did you set initial_token on the target cluster nodes to be the same as on
 source nodes?

 Use :

 nodetool info -T | grep Token | awk '{print $3}' | paste -s -d,

 To generate a comma delimited list of tokens per node and populate
 initial_token in cassandra.yaml before the first time you start any target
 Cassandra node.

 =Rob