Re: Row cache vs. OS buffer cache
Our experience is that you want to have all your very hot data fit in the row cache (assuming you don’t have very large rows), and leave the rest for the OS. Unfortunately, it completely depends on your access patterns and data what is the right size for the cache - zero makes sense for a lot of cases. Try out different sizes, and watch for row cache hit ratio and read latency. Ditto for heap sizes, btw - if your nodes are short on RAM, you may get better performance by running at lower heap sizes because OS caches will get more memory and your gc pauses will be shorter (though more numerous). /Janne On 23 Jan 2014, at 09:13 , Katriel Traum katr...@google.com wrote: Hello list, I was if anyone has any pointers or some advise regarding using row cache vs leaving it up to the OS buffer cache. I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option. Any input appreciated. Katriel
Re: Datamodel for a highscore list
What would the consequence be of having this updated highscore table (using friendId as part of the clustering index to avoid name collisions): CREATE TABLE highscore ( userId uuid, score int, friendId uuid, name varchar, PRIMARY KEY(userId, score, friendId) ) WITH CLUSTERING ORDER BY (score DESC); And then create an index: CREATE INDEX friendId_idx ON highscore ( friendId ); The table will have many million (I should expect 100+ million) entries. Each friendId would appear as many times as the user has friends. It sounds like a scenario where I should take care of using a custom index. I haven't worked with custom indexes in Cassandra before, but I assume this would allow me to query the table based on (userId, friendId) for updating highscores. But what would happen in this case? What queries would be affected and roughly to what degree? Would this be a viable option? On Wed, Jan 22, 2014 at 6:44 PM, Kasper Middelboe Petersen kas...@sybogames.com wrote: Hi! I'm a little worried about the data model I have come up with for handling highscores. I have a lot of users. Each user has a number of friends. I need a highscore list pr friend list. I would like to have it optimized for reading the highscores as opposed to setting a new highscore as the use case would suggest I would need to read the list a lot more than I would need write new highscores. Currently I have the following tables: CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo int, PRIMARY KEY(userId)) CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC); ... and a tables for friends - for the purpose of this mail assume everyone is friends with everyone else Reading the highscore list for a given user is easy. SELECT * FROM highscores WHERE userId = id. Problem is setting a new highscore. 1. I need to read-before-write to get the old score 2. I'm screwed if something goes wrong and the old score gets overwritten before all the friends highscore lists gets updated - and it is an highly visible error due to the same user is on the highscore multiple times. I would very much appreciate some feedback and/or alternatives to how to solve this with Cassandra. Thanks, Kasper
Re: Datamodel for a highscore list
Most of the work I've done like this has used sparse table definitions and the empty column trick. I didn't explain that very well in my last response. I think by using the userid as the rowid, and using the friend id as the column name with the score, that I would put an entire user's friend list on one row. The row would look like this: ROWID USERID Colin +1 320 221 9531 On Thu, Jan 23, 2014 at 2:34 AM, Kasper Middelboe Petersen kas...@sybogames.com wrote: What would the consequence be of having this updated highscore table (using friendId as part of the clustering index to avoid name collisions): CREATE TABLE highscore ( userId uuid, score int, friendId uuid, name varchar, PRIMARY KEY(userId, score, friendId) ) WITH CLUSTERING ORDER BY (score DESC); And then create an index: CREATE INDEX friendId_idx ON highscore ( friendId ); The table will have many million (I should expect 100+ million) entries. Each friendId would appear as many times as the user has friends. It sounds like a scenario where I should take care of using a custom index. I haven't worked with custom indexes in Cassandra before, but I assume this would allow me to query the table based on (userId, friendId) for updating highscores. But what would happen in this case? What queries would be affected and roughly to what degree? Would this be a viable option? On Wed, Jan 22, 2014 at 6:44 PM, Kasper Middelboe Petersen kas...@sybogames.com wrote: Hi! I'm a little worried about the data model I have come up with for handling highscores. I have a lot of users. Each user has a number of friends. I need a highscore list pr friend list. I would like to have it optimized for reading the highscores as opposed to setting a new highscore as the use case would suggest I would need to read the list a lot more than I would need write new highscores. Currently I have the following tables: CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo int, PRIMARY KEY(userId)) CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC); ... and a tables for friends - for the purpose of this mail assume everyone is friends with everyone else Reading the highscore list for a given user is easy. SELECT * FROM highscores WHERE userId = id. Problem is setting a new highscore. 1. I need to read-before-write to get the old score 2. I'm screwed if something goes wrong and the old score gets overwritten before all the friends highscore lists gets updated - and it is an highly visible error due to the same user is on the highscore multiple times. I would very much appreciate some feedback and/or alternatives to how to solve this with Cassandra. Thanks, Kasper
Re: Datamodel for a highscore list
One of tricks I've used a lot with cassandra is a sparse df definition and inserted columns programmatically that weren't in the definition. I'd be tempted to look at putting a users friend list on one row, the row would look like this: ROWIDCOLUMNS UserID UserId, UserID, UserScore:Score FriendID, score FriendID, score The UserID and UserScore columns are literal, the FriendID's are either literal or keys into the user cf. When a user gets a new score, you update that user's row and a general update query updating all rows with that userid with the new score That way, all friends are on the same row, which makes query easy. And you can still issue query to find the top score across the entire userbase by querying userid, and userscore. Is this a better explanation of my previous and lame explanation? Colin +1 320 221 9531 On Thu, Jan 23, 2014 at 2:34 AM, Kasper Middelboe Petersen kas...@sybogames.com wrote: What would the consequence be of having this updated highscore table (using friendId as part of the clustering index to avoid name collisions): CREATE TABLE highscore ( userId uuid, score int, friendId uuid, name varchar, PRIMARY KEY(userId, score, friendId) ) WITH CLUSTERING ORDER BY (score DESC); And then create an index: CREATE INDEX friendId_idx ON highscore ( friendId ); The table will have many million (I should expect 100+ million) entries. Each friendId would appear as many times as the user has friends. It sounds like a scenario where I should take care of using a custom index. I haven't worked with custom indexes in Cassandra before, but I assume this would allow me to query the table based on (userId, friendId) for updating highscores. But what would happen in this case? What queries would be affected and roughly to what degree? Would this be a viable option? On Wed, Jan 22, 2014 at 6:44 PM, Kasper Middelboe Petersen kas...@sybogames.com wrote: Hi! I'm a little worried about the data model I have come up with for handling highscores. I have a lot of users. Each user has a number of friends. I need a highscore list pr friend list. I would like to have it optimized for reading the highscores as opposed to setting a new highscore as the use case would suggest I would need to read the list a lot more than I would need write new highscores. Currently I have the following tables: CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo int, PRIMARY KEY(userId)) CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC); ... and a tables for friends - for the purpose of this mail assume everyone is friends with everyone else Reading the highscore list for a given user is easy. SELECT * FROM highscores WHERE userId = id. Problem is setting a new highscore. 1. I need to read-before-write to get the old score 2. I'm screwed if something goes wrong and the old score gets overwritten before all the friends highscore lists gets updated - and it is an highly visible error due to the same user is on the highscore multiple times. I would very much appreciate some feedback and/or alternatives to how to solve this with Cassandra. Thanks, Kasper
Cassandra timeout on node failure
We are seeing a weird issue with our Cassandra cluster(version 1.0.10). We have 6 nodes(DC1:3, DC2:3) in our cluster. So all 6 nodes are replicas of each other. All reads and writes are LOCAL_QOURUM. We see that when one of the node in DC1 fails, we see timeout errors on the second node for reads. When we turned on DEBUG level logs, we see the following error in the Cassandra logs – DEBUG [Thrift:322] 2013-12-20 14:30:20,123 StorageProxy.java (line 676) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 2 responses from / xxx.xxx.xxx.IP1, xxx.xxx.xxx.IP2, . Considering that for LOCAL_QOURUM, we only need 2 nodes out of the 3 in the DC, I am surprised we are seeing this issue. The log clearly says it has received 2 responses. Interestingly, when we connect to the third node after the second node returned timeout error, it works as expected. Has anyone else faced this issue?
Re: Row cache vs. OS buffer cache
My experience has been that the row cache is much more effective. However, reasonable row cache sizes are so small relative to RAM that I don't see it as a significant trade-off unless it's in a very memory constrained environment. If you want to enable the row cache (a big if) you probably want it to be as big as it can be until you have reached the point of diminishing returns on the hit rate. The off-heap cache still has many on-heap objects so it's doesn't really change that much conceptually, you will just end up with a different number for the size. On 01/23/2014 02:13 AM, Katriel Traum wrote: Hello list, I was if anyone has any pointers or some advise regarding using row cache vs leaving it up to the OS buffer cache. I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option. Any input appreciated. Katriel
Re: Opscenter tabs
Multiple DCs are still a single cluster in OpsCenter. If you go to Physical View, you should see one column for each data center. Also, the Community edition of OpsCenter, last I saw, only supported a single cluster. On Thu, Jan 23, 2014 at 12:06 PM, Daniel Curry daniel.cu...@arrayent.comwrote: I am unable to find any references on if the tabs to monitor multiple DC can be configure to read the DC location. I do not want to change the cluster name itself. Right now I see three tabs all with the same names cluster_name: test. Like to keep the current cluster name test, but change the opscenter tabs to DC1, DC2, and DC3. Is this documented somewhere? -- Daniel Curry Sr Linux Systems Administrator Arrayent, Inc. 2317 Broadway Street, Suite 20 Redwood City, CA 94063 dan...@arrayent.com -- *Ken Hancock *| System Architect, Advanced Advertising SeaChange International 50 Nagog Park Acton, Massachusetts 01720 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]http://www.linkedin.com/in/kenhancock [image: SeaChange International] http://www.schange.com/This e-mail and any attachments may contain information which is SeaChange International confidential. The information enclosed is intended only for the addressees herein and may not be copied or forwarded without permission from SeaChange International.
Re: Row cache vs. OS buffer cache
On Wed, Jan 22, 2014 at 11:13 PM, Katriel Traum katr...@google.com wrote: I was if anyone has any pointers or some advise regarding using row cache vs leaving it up to the OS buffer cache. I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option. Many people have had bad experiences with Row Cache, I assert more than have had a good experience. https://issues.apache.org/jira/browse/CASSANDRA-5357 Is the 2.1 era re-design of the row cache into something more conceptually appropriate. The rule of thumb for row cache is that if your data is : 1) very hot 2) very small 3) very uniform in size You may win with it. IMO if you meet all of those criteria you should try A/B the on-heap cache vs. off-heap in 1.1/1.2, especially if your cached rows are frequently updated. https://issues.apache.org/jira/browse/CASSANDRA-5348?focusedCommentId=13794634page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13794634 =Rob
Re: Cassandra timeout on node failure
On Thu, Jan 23, 2014 at 8:52 AM, Ankit Patel patel7...@hotmail.com wrote: We are seeing a weird issue with our Cassandra cluster(version 1.0.10). We have 6 nodes(DC1:3, DC2:3) in our cluster. So all 6 nodes are replicas of each other. All reads and writes are LOCAL_QOURUM. Frankly I'm surprised that 1.0.10 includes LOCAL_QUORUM. My first advice would be to upgrade, current trunk is 3 major versions above 1.0.10. We see that when one of the node in DC1 fails, we see timeout errors on the second node for reads. When we turned on DEBUG level logs, we see the following error in the Cassandra logs – DEBUG [Thrift:322] 2013-12-20 14:30:20,123 StorageProxy.java (line 676) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 2 responses from / xxx.xxx.xxx.IP1, xxx.xxx.xxx.IP2, . Considering that for LOCAL_QOURUM, we only need 2 nodes out of the 3 in the DC, I am surprised we are seeing this issue. The log clearly says it has received 2 responses. Interestingly, when we connect to the third node after the second node returned timeout error, it works as expected. Has anyone else faced this issue? Have you searched the Apache JIRA? If you can replicate on more modern (1.2.13 / 2.0.4) Cassandra, file a JIRA! =Rob
Re: Any Limits on number of items in a collection column type
Alternatively you can use clustering columns to store very big collections. Beware of not making a row too wide though (use bucketing) Le 23 janv. 2014 04:29, Manoj Khangaonkar khangaon...@gmail.com a écrit : Thanks. I guess I can work around by maintaining hour_counts (which will have fewer items) and adding the hour counts to get day counts. regards On Wed, Jan 22, 2014 at 7:15 PM, Robert Wille rwi...@fold3.com wrote: I didn’t read your question properly. Collections are limited to 64K items, not 64K bytes per item. From: Manoj Khangaonkar khangaon...@gmail.com Reply-To: user@cassandra.apache.org Date: Wednesday, January 22, 2014 at 7:17 PM To: user@cassandra.apache.org Subject: Any Limits on number of items in a collection column type Hi, On C* 2.0.0. 3 Node cluster. I have a column daycount listBigInt. The column is storing a count. Every few secs a new count is appended. The total count for the day is the sum of all items in the list. My application logs indicate I wrote about 11 items to the column for a particular row. Assume row key is day_timestamp. But when I do a read on the column I get back a list with only 43000 items. Checked with both java driver and CQL. There are no errors or exceptions anywhere. There is this statement in the WIKI Collection values may not be larger than 64K. I assume this refers to 1 item in a collection. Has anyone else seen an issue like this ? regards MJ -- http://khangaonkar.blogspot.com/ -- http://khangaonkar.blogspot.com/
Re: Opscenter tabs
A vaguely related question...my OpsCenter now has two separate tabs for the same cluster...one tab shows all six nodes and has their agents...the other tab has the same six nodes but no agents. I see no way to get rid of the spurious tab. On Thu, Jan 23, 2014 at 12:47 PM, Ken Hancock ken.hanc...@schange.comwrote: Multiple DCs are still a single cluster in OpsCenter. If you go to Physical View, you should see one column for each data center. Also, the Community edition of OpsCenter, last I saw, only supported a single cluster. On Thu, Jan 23, 2014 at 12:06 PM, Daniel Curry daniel.cu...@arrayent.comwrote: I am unable to find any references on if the tabs to monitor multiple DC can be configure to read the DC location. I do not want to change the cluster name itself. Right now I see three tabs all with the same names cluster_name: test. Like to keep the current cluster name test, but change the opscenter tabs to DC1, DC2, and DC3. Is this documented somewhere? -- Daniel Curry Sr Linux Systems Administrator Arrayent, Inc. 2317 Broadway Street, Suite 20 Redwood City, CA 94063 dan...@arrayent.com -- *Ken Hancock *| System Architect, Advanced Advertising SeaChange International 50 Nagog Park Acton, Massachusetts 01720 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn] http://www.linkedin.com/in/kenhancock [image: SeaChange International] http://www.schange.com/This e-mail and any attachments may contain information which is SeaChange International confidential. The information enclosed is intended only for the addressees herein and may not be copied or forwarded without permission from SeaChange International.
Re: Row cache vs. OS buffer cache
Thank you everyone for your input. My dataset is ~100G of size with 1 or 2 read intensive column families. The cluster has plenty of RAM. I'll start off small with 4G of row cache and monitor the success rate. Katriel On Thu, Jan 23, 2014 at 9:17 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jan 22, 2014 at 11:13 PM, Katriel Traum katr...@google.comwrote: I was if anyone has any pointers or some advise regarding using row cache vs leaving it up to the OS buffer cache. I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option. Many people have had bad experiences with Row Cache, I assert more than have had a good experience. https://issues.apache.org/jira/browse/CASSANDRA-5357 Is the 2.1 era re-design of the row cache into something more conceptually appropriate. The rule of thumb for row cache is that if your data is : 1) very hot 2) very small 3) very uniform in size You may win with it. IMO if you meet all of those criteria you should try A/B the on-heap cache vs. off-heap in 1.1/1.2, especially if your cached rows are frequently updated. https://issues.apache.org/jira/browse/CASSANDRA-5348?focusedCommentId=13794634page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13794634 =Rob
Re: How to add a new DC to cluster in Cassandra 2.x
On Tue, Jan 21, 2014 at 7:16 AM, Tupshin Harper tups...@tupshin.com wrote: This should be the doc you are looking for. http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html Rebuild (like bootstrap) only streams data from a single source replica per range. IMO, therefore, the above process should end with a repair all nodes in both data centers with -pr step. Otherwise, requests to the new DC with LOCAL_X ConsistencyLevels or CL.ONE may violate consistency. I have bcc:ed d...@datastax.com, in case they agree and want to modify the above doc. :D =Rob
RE: How to add a new DC to cluster in Cassandra 2.x
Thanks a lot :) From: Robert Coli [mailto:rc...@eventbrite.com] Sent: 2014年1月24日 4:54 To: user@cassandra.apache.org Subject: Re: How to add a new DC to cluster in Cassandra 2.x On Tue, Jan 21, 2014 at 7:16 AM, Tupshin Harper tups...@tupshin.commailto:tups...@tupshin.com wrote: This should be the doc you are looking for. http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html Rebuild (like bootstrap) only streams data from a single source replica per range. IMO, therefore, the above process should end with a repair all nodes in both data centers with -pr step. Otherwise, requests to the new DC with LOCAL_X ConsistencyLevels or CL.ONE may violate consistency. I have bcc:ed d...@datastax.commailto:d...@datastax.com, in case they agree and want to modify the above doc. :D =Rob
Re: Extremely long GC
Hi Joel, One simple above log record will not help. Please provide the entire command line the process is started with including JVM options, the log file showing all GC messages.. Is it possible for you to collect VerboseGC, which would print the before and after statistics of the memory. Version of java, version of Cassandra, availability of swap space, disk space etc., CPU usage around the timeframe the garbage collection has happened. What is the java -Xms (minimum memory) setting, try reducing it and see if it helps. Yogi On Wed, Jan 22, 2014 at 11:12 PM, Joel Samuelsson samuelsson.j...@gmail.com wrote: Here is one example. 12GB data, no load besides OpsCenter and perhaps 1-2 requests per minute. INFO [ScheduledTasks:1] 2013-12-29 01:03:25,381 GCInspector.java (line 119) GC for ParNew: 426400 ms for 1 collections, 2253360864 used; max is 4114612224 2014/1/22 Yogi Nerella ynerella...@gmail.com Hi, Can you share the GC logs for the systems you are running problems into? Yogi On Wed, Jan 22, 2014 at 6:50 AM, Joel Samuelsson samuelsson.j...@gmail.com wrote: Hello, We've been having problems with long GC pauses and can't seem to get rid of them. Our latest test is on a clean machine with Ubuntu 12.04 LTS, Java 1.7.0_45 and JNA installed. It is a single node cluster with most settings being default, the only things changed are ip-addresses, cluster name and partitioner (to RandomPartitioner). We are running Cassandra 2.0.4. We are running on a virtual machine with Xen. We have 16GB of ram and default memory settings for C* (i.e. heap size of 4GB). CPU specified as 8 cores by our provider. Right now, we have no data on the machine and no requests to it at all. Still we get ParNew GCs like the following: INFO [ScheduledTasks:1] 2014-01-18 10:54:42,286 GCInspector.java (line 116) GC for ParNew: 464 ms for 1 collections, 102838776 used; max is 4106223616 While this may not be extremely long, on other machines with the same setup but some data (around 12GB) and around 10 read requests/s (i.e. basically no load) we have seen ParNew GC for 20 minutes or more. During this time, the machine goes down completely (I can't even ssh to it). The requests are mostly from OpsCenter and the rows requested are not extremely large (typically less than 1KB). We have tried a lot of different things to solve these issues since we've been having them for a long time including: - Upgrading Cassandra to new versions - Upgrading Java to new versions - Printing promotion failures in GC-log (no failures found!) - Different sizes of heap and heap space for different GC spaces (Eden etc.) - Different versions of Ubuntu - Running on Amazon EC2 instead of the provider we are using now (not with Datastax AMI) Something that may be a clue is that when running the DataStax Community AMI on Amazon we haven't seen the GC yet (it's been running for a week or so). Just to be clear, another test on Amazon EC2 mentioned above (without the Datastax AMI) shows the GC freezes. If any other information is needed, just let me know. Best regards, Joel Samuelsson