Re: Row cache vs. OS buffer cache

2014-01-23 Thread Janne Jalkanen

Our experience is that you want to have all your very hot data fit in the row 
cache (assuming you don’t have very large rows), and leave the rest for the OS. 
 Unfortunately, it completely depends on your access patterns and data what is 
the right size for the cache - zero makes sense for a lot of cases.

Try out different sizes, and watch for row cache hit ratio and read latency. 
Ditto for heap sizes, btw - if your nodes are short on RAM, you may get better 
performance by running at lower heap sizes because OS caches will get more 
memory and your gc pauses will be shorter (though more numerous).

/Janne

On 23 Jan 2014, at 09:13 , Katriel Traum katr...@google.com wrote:

 Hello list,
 
 I was if anyone has any pointers or some advise regarding using row cache vs 
 leaving it up to the OS buffer cache.
 
 I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option.
 
 Any input appreciated.
 Katriel



Re: Datamodel for a highscore list

2014-01-23 Thread Kasper Middelboe Petersen
What would the consequence be of having this updated highscore table (using
friendId as part of the clustering index to avoid name collisions):

CREATE TABLE highscore (
  userId uuid,
  score int,
  friendId uuid,
  name varchar,
  PRIMARY KEY(userId, score, friendId)
) WITH CLUSTERING ORDER BY (score DESC);

And then create an index:

CREATE INDEX friendId_idx ON highscore ( friendId );

The table will have many million (I should expect 100+ million) entries.
Each friendId would appear as many times as the user has friends. It sounds
like a scenario where I should take care of using a custom index.

I haven't worked with custom indexes in Cassandra before, but I assume this
would allow me to query the table based on (userId, friendId) for updating
highscores.

But what would happen in this case? What queries would be affected and
roughly to what degree?

Would this be a viable option?



On Wed, Jan 22, 2014 at 6:44 PM, Kasper Middelboe Petersen 
kas...@sybogames.com wrote:

 Hi!

 I'm a little worried about the data model I have come up with for handling
 highscores.

 I have a lot of users. Each user has a number of friends. I need a
 highscore list pr friend list.

 I would like to have it optimized for reading the highscores as opposed to
 setting a new highscore as the use case would suggest I would need to read
 the list a lot more than I would need write new highscores.

 Currently I have the following tables:
 CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo
 int, PRIMARY KEY(userId))
 CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY
 KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC);
 ... and a tables for friends - for the purpose of this mail assume
 everyone is friends with everyone else

 Reading the highscore list for a given user is easy. SELECT * FROM
 highscores WHERE userId = id.

 Problem is setting a new highscore.
 1. I need to read-before-write to get the old score
 2. I'm screwed if something goes wrong and the old score gets overwritten
 before all the friends highscore lists gets updated - and it is an highly
 visible error due to the same user is on the highscore multiple times.

 I would very much appreciate some feedback and/or alternatives to how to
 solve this with Cassandra.


 Thanks,
 Kasper



Re: Datamodel for a highscore list

2014-01-23 Thread Colin Clark
Most of the work I've done like this has used sparse table definitions and
the empty column trick.  I didn't explain that very well in my last
response.

I think by using the userid as the rowid, and using the friend id as the
column name with the score, that I would put an entire user's friend list
on one row.  The row would look like this:

ROWID
USERID

Colin
+1 320 221 9531



On Thu, Jan 23, 2014 at 2:34 AM, Kasper Middelboe Petersen 
kas...@sybogames.com wrote:

 What would the consequence be of having this updated highscore table
 (using friendId as part of the clustering index to avoid name collisions):

 CREATE TABLE highscore (
   userId uuid,
   score int,
   friendId uuid,
   name varchar,
   PRIMARY KEY(userId, score, friendId)
 ) WITH CLUSTERING ORDER BY (score DESC);

 And then create an index:

 CREATE INDEX friendId_idx ON highscore ( friendId );

 The table will have many million (I should expect 100+ million) entries.
 Each friendId would appear as many times as the user has friends. It sounds
 like a scenario where I should take care of using a custom index.

 I haven't worked with custom indexes in Cassandra before, but I assume
 this would allow me to query the table based on (userId, friendId) for
 updating highscores.

 But what would happen in this case? What queries would be affected and
 roughly to what degree?

 Would this be a viable option?



 On Wed, Jan 22, 2014 at 6:44 PM, Kasper Middelboe Petersen 
 kas...@sybogames.com wrote:

 Hi!

 I'm a little worried about the data model I have come up with for
 handling highscores.

 I have a lot of users. Each user has a number of friends. I need a
 highscore list pr friend list.

 I would like to have it optimized for reading the highscores as opposed
 to setting a new highscore as the use case would suggest I would need to
 read the list a lot more than I would need write new highscores.

 Currently I have the following tables:
 CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo
 int, PRIMARY KEY(userId))
 CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY
 KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC);
 ... and a tables for friends - for the purpose of this mail assume
 everyone is friends with everyone else

 Reading the highscore list for a given user is easy. SELECT * FROM
 highscores WHERE userId = id.

 Problem is setting a new highscore.
 1. I need to read-before-write to get the old score
 2. I'm screwed if something goes wrong and the old score gets overwritten
 before all the friends highscore lists gets updated - and it is an highly
 visible error due to the same user is on the highscore multiple times.

 I would very much appreciate some feedback and/or alternatives to how to
 solve this with Cassandra.


 Thanks,
 Kasper





Re: Datamodel for a highscore list

2014-01-23 Thread Colin Clark
One of tricks I've used a lot with cassandra is a sparse df definition and
inserted columns programmatically that weren't in the definition.

I'd be tempted to look at putting a users friend list on one row, the row
would look like this:

ROWIDCOLUMNS

UserID UserId, UserID, UserScore:Score FriendID, score
 FriendID,   score 

The UserID and UserScore columns are literal, the FriendID's are either
literal or keys into the user cf.

When a user gets a new score, you update that user's row and a general
update query updating all rows with that userid with the new score

That way, all friends are on the same row, which makes query easy.  And you
can still issue query to find the top score across the entire userbase by
querying userid, and userscore.

Is this a better explanation of my previous and lame explanation?

Colin
+1 320 221 9531



On Thu, Jan 23, 2014 at 2:34 AM, Kasper Middelboe Petersen 
kas...@sybogames.com wrote:

 What would the consequence be of having this updated highscore table
 (using friendId as part of the clustering index to avoid name collisions):

 CREATE TABLE highscore (
   userId uuid,
   score int,
   friendId uuid,
   name varchar,
   PRIMARY KEY(userId, score, friendId)
 ) WITH CLUSTERING ORDER BY (score DESC);

 And then create an index:

 CREATE INDEX friendId_idx ON highscore ( friendId );

 The table will have many million (I should expect 100+ million) entries.
 Each friendId would appear as many times as the user has friends. It sounds
 like a scenario where I should take care of using a custom index.

 I haven't worked with custom indexes in Cassandra before, but I assume
 this would allow me to query the table based on (userId, friendId) for
 updating highscores.

 But what would happen in this case? What queries would be affected and
 roughly to what degree?

 Would this be a viable option?



 On Wed, Jan 22, 2014 at 6:44 PM, Kasper Middelboe Petersen 
 kas...@sybogames.com wrote:

 Hi!

 I'm a little worried about the data model I have come up with for
 handling highscores.

 I have a lot of users. Each user has a number of friends. I need a
 highscore list pr friend list.

 I would like to have it optimized for reading the highscores as opposed
 to setting a new highscore as the use case would suggest I would need to
 read the list a lot more than I would need write new highscores.

 Currently I have the following tables:
 CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo
 int, PRIMARY KEY(userId))
 CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY
 KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC);
 ... and a tables for friends - for the purpose of this mail assume
 everyone is friends with everyone else

 Reading the highscore list for a given user is easy. SELECT * FROM
 highscores WHERE userId = id.

 Problem is setting a new highscore.
 1. I need to read-before-write to get the old score
 2. I'm screwed if something goes wrong and the old score gets overwritten
 before all the friends highscore lists gets updated - and it is an highly
 visible error due to the same user is on the highscore multiple times.

 I would very much appreciate some feedback and/or alternatives to how to
 solve this with Cassandra.


 Thanks,
 Kasper





Cassandra timeout on node failure

2014-01-23 Thread Ankit Patel


We are seeing a weird issue with our Cassandra cluster(version 
1.0.10). We have 6 nodes(DC1:3, DC2:3) in our cluster. So all 6 nodes 
are replicas of each other. All reads and writes are LOCAL_QOURUM. We 
see that when one of the node in DC1 fails, we see timeout errors on the
 second node for reads. When we turned on DEBUG level logs, we see the 
following error in the Cassandra logs –


DEBUG [Thrift:322] 2013-12-20 14:30:20,123
 StorageProxy.java (line 676) Read timeout: 
java.util.concurrent.TimeoutException: Operation timed out - received 
only 2 responses from / xxx.xxx.xxx.IP1, xxx.xxx.xxx.IP2, .


Considering that for LOCAL_QOURUM, we only need 2 nodes out of the 3 
in the DC, I am surprised we are seeing this issue. The log clearly says
 it has received 2 responses. Interestingly, when we connect to the 
third node after the second node returned timeout error, it works as 
expected. Has anyone else faced this issue?

  

Re: Row cache vs. OS buffer cache

2014-01-23 Thread Chris Burroughs

My experience has been that the row cache is much more effective.
However, reasonable row cache sizes are so small relative to RAM that I 
don't see it as a significant trade-off unless it's in a very memory 
constrained environment.  If you want to enable the row cache (a big if) 
you probably want it to be as big as it can be until you have reached 
the point of diminishing returns on the hit rate.


The off-heap cache still has many on-heap objects so it's doesn't 
really change that much conceptually, you will just end up with a 
different number for the size.


On 01/23/2014 02:13 AM, Katriel Traum wrote:

Hello list,

I was if anyone has any pointers or some advise regarding using row cache
vs leaving it up to the OS buffer cache.

I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option.

Any input appreciated.
Katriel





Re: Opscenter tabs

2014-01-23 Thread Ken Hancock
Multiple DCs are still a single cluster in OpsCenter.  If you go to
Physical View, you should see one column for each data center.

Also, the Community edition of OpsCenter, last I saw, only supported a
single cluster.



On Thu, Jan 23, 2014 at 12:06 PM, Daniel Curry daniel.cu...@arrayent.comwrote:

   I am unable to find any references on if the tabs to monitor multiple DC
 can be configure to read the DC location. I do not want to change the
 cluster name itself.  Right now I see three tabs all with the same names
 cluster_name:  test.  Like to keep the current cluster name test, but
 change the opscenter tabs to DC1, DC2, and DC3.

Is this documented somewhere?

 --
 Daniel Curry
 Sr Linux Systems Administrator
 Arrayent, Inc.
 2317 Broadway Street, Suite 20
 Redwood City, CA 94063
 dan...@arrayent.com




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.


Re: Row cache vs. OS buffer cache

2014-01-23 Thread Robert Coli
On Wed, Jan 22, 2014 at 11:13 PM, Katriel Traum katr...@google.com wrote:

 I was if anyone has any pointers or some advise regarding using row cache
 vs leaving it up to the OS buffer cache.

 I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option.


Many people have had bad experiences with Row Cache, I assert more than
have had a good experience.

https://issues.apache.org/jira/browse/CASSANDRA-5357

Is the 2.1 era re-design of the row cache into something more conceptually
appropriate.

The rule of thumb for row cache is that if your data is :

1) very hot
2) very small
3) very uniform in size

You may win with it. IMO if you meet all of those criteria you should try
A/B the on-heap cache vs. off-heap in 1.1/1.2, especially if your cached
rows are frequently updated.

https://issues.apache.org/jira/browse/CASSANDRA-5348?focusedCommentId=13794634page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13794634

=Rob


Re: Cassandra timeout on node failure

2014-01-23 Thread Robert Coli
On Thu, Jan 23, 2014 at 8:52 AM, Ankit Patel patel7...@hotmail.com wrote:

  We are seeing a weird issue with our Cassandra cluster(version 1.0.10).
 We have 6 nodes(DC1:3, DC2:3) in our cluster. So all 6 nodes are replicas
 of each other. All reads and writes are LOCAL_QOURUM.


Frankly I'm surprised that 1.0.10 includes LOCAL_QUORUM.

My first advice would be to upgrade, current trunk is 3 major versions
above 1.0.10.


 We see that when one of the node in DC1 fails, we see timeout errors on
 the second node for reads. When we turned on DEBUG level logs, we see the
 following error in the Cassandra logs –
 DEBUG [Thrift:322] 2013-12-20 14:30:20,123 StorageProxy.java (line 676)
 Read timeout: java.util.concurrent.TimeoutException: Operation timed out -
 received only 2 responses from / xxx.xxx.xxx.IP1, xxx.xxx.xxx.IP2, .
 Considering that for LOCAL_QOURUM, we only need 2 nodes out of the 3 in
 the DC, I am surprised we are seeing this issue. The log clearly says it
 has received 2 responses. Interestingly, when we connect to the third node
 after the second node returned timeout error, it works as expected. Has
 anyone else faced this issue?


Have you searched the Apache JIRA? If you can replicate on more modern
(1.2.13 / 2.0.4) Cassandra, file a JIRA!

=Rob


Re: Any Limits on number of items in a collection column type

2014-01-23 Thread DuyHai Doan
Alternatively you can use clustering columns to store very big collections.

Beware of not making a row too wide though (use bucketing)
Le 23 janv. 2014 04:29, Manoj Khangaonkar khangaon...@gmail.com a écrit
:

 Thanks. I guess I can work around by maintaining hour_counts (which will
 have fewer items) and adding the hour counts to
 get day counts.

 regards


 On Wed, Jan 22, 2014 at 7:15 PM, Robert Wille rwi...@fold3.com wrote:

 I didn’t read your question properly. Collections are limited to 64K
 items, not 64K bytes per item.

  From: Manoj Khangaonkar khangaon...@gmail.com
 Reply-To: user@cassandra.apache.org
 Date: Wednesday, January 22, 2014 at 7:17 PM
 To: user@cassandra.apache.org
 Subject: Any Limits on number of items in a collection column type

 Hi,

 On C* 2.0.0. 3 Node cluster.

 I have a column  daycount listBigInt. The column is storing a count.
 Every few secs a new count is appended. The total count for the day is the
 sum of all items in the list.

 My application logs indicate I wrote about 11 items to the column for
 a particular  row. Assume row key is day_timestamp.

 But when I do a read on the column I get back a list with only 43000
 items. Checked with both java driver and CQL.

 There are no errors or exceptions anywhere.

 There is this statement in the WIKI Collection values may not be larger
 than 64K. I assume this refers to 1 item in a collection.

 Has anyone else seen an issue like this ?

 regards

 MJ


 --
 http://khangaonkar.blogspot.com/




 --
 http://khangaonkar.blogspot.com/



Re: Opscenter tabs

2014-01-23 Thread Brian Tarbox
A vaguely related question...my OpsCenter now has two separate tabs for the
same cluster...one tab shows all six nodes and has their agents...the other
tab has the same six nodes but no agents.

I see no way to get rid of the spurious tab.


On Thu, Jan 23, 2014 at 12:47 PM, Ken Hancock ken.hanc...@schange.comwrote:

 Multiple DCs are still a single cluster in OpsCenter.  If you go to
 Physical View, you should see one column for each data center.

 Also, the Community edition of OpsCenter, last I saw, only supported a
 single cluster.



 On Thu, Jan 23, 2014 at 12:06 PM, Daniel Curry 
 daniel.cu...@arrayent.comwrote:

   I am unable to find any references on if the tabs to monitor multiple
 DC can be configure to read the DC location. I do not want to change the
 cluster name itself.  Right now I see three tabs all with the same names
 cluster_name:  test.  Like to keep the current cluster name test, but
 change the opscenter tabs to DC1, DC2, and DC3.

Is this documented somewhere?

 --
 Daniel Curry
 Sr Linux Systems Administrator
 Arrayent, Inc.
 2317 Broadway Street, Suite 20
 Redwood City, CA 94063
 dan...@arrayent.com




 --
 *Ken Hancock *| System Architect, Advanced Advertising
 SeaChange International
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | 
 NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

 Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
 LinkedIn] http://www.linkedin.com/in/kenhancock

 [image: SeaChange International]
  http://www.schange.com/This e-mail and any attachments may contain
 information which is SeaChange International confidential. The information
 enclosed is intended only for the addressees herein and may not be copied
 or forwarded without permission from SeaChange International.



Re: Row cache vs. OS buffer cache

2014-01-23 Thread Katriel Traum
Thank you everyone for your input.
My dataset is ~100G of size with 1 or 2 read intensive column families. The
cluster has plenty of RAM. I'll start off small with 4G of row cache and
monitor the success rate.

Katriel


On Thu, Jan 23, 2014 at 9:17 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jan 22, 2014 at 11:13 PM, Katriel Traum katr...@google.comwrote:

 I was if anyone has any pointers or some advise regarding using row cache
 vs leaving it up to the OS buffer cache.

 I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option.


 Many people have had bad experiences with Row Cache, I assert more than
 have had a good experience.

 https://issues.apache.org/jira/browse/CASSANDRA-5357

 Is the 2.1 era re-design of the row cache into something more conceptually
 appropriate.

 The rule of thumb for row cache is that if your data is :

 1) very hot
 2) very small
 3) very uniform in size

 You may win with it. IMO if you meet all of those criteria you should try
 A/B the on-heap cache vs. off-heap in 1.1/1.2, especially if your cached
 rows are frequently updated.


 https://issues.apache.org/jira/browse/CASSANDRA-5348?focusedCommentId=13794634page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13794634

 =Rob




Re: How to add a new DC to cluster in Cassandra 2.x

2014-01-23 Thread Robert Coli
On Tue, Jan 21, 2014 at 7:16 AM, Tupshin Harper tups...@tupshin.com wrote:

 This should be the doc you are looking for.


 http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html


Rebuild (like bootstrap) only streams data from a single source replica per
range. IMO, therefore, the above process should end with a repair all
nodes in both data centers with -pr step. Otherwise, requests to the new
DC with LOCAL_X ConsistencyLevels or CL.ONE may violate consistency.

I have bcc:ed d...@datastax.com, in case they agree and want to modify the
above doc. :D

=Rob


RE: How to add a new DC to cluster in Cassandra 2.x

2014-01-23 Thread Lu, Boying
Thanks a lot :)

From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: 2014年1月24日 4:54
To: user@cassandra.apache.org
Subject: Re: How to add a new DC to cluster in Cassandra 2.x

On Tue, Jan 21, 2014 at 7:16 AM, Tupshin Harper 
tups...@tupshin.commailto:tups...@tupshin.com wrote:

This should be the doc you are looking for.

http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html

Rebuild (like bootstrap) only streams data from a single source replica per 
range. IMO, therefore, the above process should end with a repair all nodes 
in both data centers with -pr step. Otherwise, requests to the new DC with 
LOCAL_X ConsistencyLevels or CL.ONE may violate consistency.

I have bcc:ed d...@datastax.commailto:d...@datastax.com, in case they agree 
and want to modify the above doc. :D

=Rob



Re: Extremely long GC

2014-01-23 Thread Yogi Nerella
Hi Joel,

One simple above log record will not help.   Please provide the entire
command line the process is started with including JVM options, the log
file showing all GC messages..
Is it possible for you to collect VerboseGC, which would print the before
and after statistics of the memory.

Version of java, version of Cassandra, availability of swap space, disk
space etc.,
CPU usage around the timeframe the garbage collection has happened.

What is the java -Xms (minimum memory) setting, try reducing it and see if
it helps.


Yogi




On Wed, Jan 22, 2014 at 11:12 PM, Joel Samuelsson samuelsson.j...@gmail.com
 wrote:

 Here is one example. 12GB data, no load besides OpsCenter and perhaps 1-2
 requests per minute.
 INFO [ScheduledTasks:1] 2013-12-29 01:03:25,381 GCInspector.java (line
 119) GC for ParNew: 426400 ms for 1 collections, 2253360864 used; max is
 4114612224


 2014/1/22 Yogi Nerella ynerella...@gmail.com

 Hi,

 Can you share the GC logs for the systems you are running problems into?

 Yogi


 On Wed, Jan 22, 2014 at 6:50 AM, Joel Samuelsson 
 samuelsson.j...@gmail.com wrote:

 Hello,

 We've been having problems with long GC pauses and can't seem to get rid
 of them.

 Our latest test is on a clean machine with Ubuntu 12.04 LTS, Java
 1.7.0_45 and JNA installed.
 It is a single node cluster with most settings being default, the only
 things changed are ip-addresses, cluster name and partitioner (to
 RandomPartitioner).
 We are running Cassandra 2.0.4.
 We are running on a virtual machine with Xen.
 We have 16GB of ram and default memory settings for C* (i.e. heap size
 of 4GB). CPU specified as 8 cores by our provider.

 Right now, we have no data on the machine and no requests to it at all.
 Still we get ParNew GCs like the following:
 INFO [ScheduledTasks:1] 2014-01-18 10:54:42,286 GCInspector.java (line
 116) GC for ParNew: 464 ms for 1 collections, 102838776 used; max is
 4106223616

 While this may not be extremely long, on other machines with the same
 setup but some data (around 12GB) and around 10 read requests/s (i.e.
 basically no load) we have seen ParNew GC for 20 minutes or more. During
 this time, the machine goes down completely (I can't even ssh to it). The
 requests are mostly from OpsCenter and the rows requested are not extremely
 large (typically less than 1KB).

 We have tried a lot of different things to solve these issues since
 we've been having them for a long time including:
 - Upgrading Cassandra to new versions
 - Upgrading Java to new versions
 - Printing promotion failures in GC-log (no failures found!)
 - Different sizes of heap and heap space for different GC spaces (Eden
 etc.)
 - Different versions of Ubuntu
 - Running on Amazon EC2 instead of the provider we are using now (not
 with Datastax AMI)

 Something that may be a clue is that when running the DataStax Community
 AMI on Amazon we haven't seen the GC yet (it's been running for a week or
 so). Just to be clear, another test on Amazon EC2 mentioned above (without
 the Datastax AMI) shows the GC freezes.

 If any other information is needed, just let me know.

 Best regards,
 Joel Samuelsson