[replying to list, with permission] On Mon, Feb 22, 2010 at 12:05 AM, <jeremey.barr...@nokia.com> wrote: > I'm looking for a very scalable primary data store for a large web/API > application. Our data consists largely of lists of things, per user. So a > user has a bunch (dozens to hundreds) of thing A, some of thing B, a few of > thing C, etc. There are social elements to the app w/ shared data, so that > could be modeled with each user having a list of pointers, but with writes > being super cheap I'm more inclined to write everything everywhere (that's a > side issue, but it's in the back of my mind). Users number in the millions. > > So basically I'm looking for something scalable, available, fast, and with > native support for range scans (given that almost every query is fetching > some list of things). This is where my questions lie... I'm pretty familiar > with the Bigtable model and it suits my needs quite well, I would store thing > A under a row key of "userid.thingid" (or similar) and then a range scan over > "userid." will pick them all up at once. > > HBase has been top of my list in terms of data model, but I ran across a > performance study which suggested it's questionable and the complexity of > components gives me some pause. So Cassandra seems the other obvious choice. > However, the data model isn't as clear to me (at least, not yet, which is > probably just a terminology problem). > > My questions: > > 1) would you consider Cassandra (0.5+) "safe enough" for a primary data > store?
Yes. Several companies are deploying 0.5 in production. It's pretty solid. (We'll have a 0.5.1 fixing some minor issues RSN, and a 0.6 beta.) And I agree that it's significantly simpler to deploy (and keep running) than HBase. > 2) is the row key model I suggested above the best approach in Cassandra, or > is there something better? My testing so far has been using get_range_slice > with a ColumnParent of just the CF and SlicePredicate listing the columns I > want (though really I want all columns, is there a shorthand for that?) Cassandra deals fine with millions of columns per row, and allows prefix queries on columns too. So an alternate model would be to have userX as row key, and column keys "A:1, A:2, A:3, ..., B:1, B:2, B:3, ...". This will be marginally faster than splitting by row, and has the added advantage of not requiring OPP. You could use supercolumns here too (where the supercolumn name is the thing type). If you always want to retrieve all things of type A at a time per user, then that is a more natural fit. (Otherwise, the lack of subcolumn indexing could be a performance gotcha for you: http://issues.apache.org/jira/browse/CASSANDRA-598). > 3) schema changes (i.e. adding a new CF)... seems like currently you take > the whole cluster down to accomplish this... is that likely to change in the > future? You have to take each node down, but a rolling restart is fine. No reason for the whole cluster to be down at once. We're planning to make CF changes doable against live nodes for 0.7, in https://issues.apache.org/jira/browse/CASSANDRA-44. > 4) any tuning suggestions for this kind of setup? (primary data store using > OrderPreservingPartitioner doing lots of range scans, etc.) Nothing unusual -- just the typical "try to have enough RAM to cache your 'hot' data set." > 5) I noticed mention in some discussion that the OrderPreserving mode is not > as well utilized and is probably in need of optimizations... how serious is > that, and are there people working on that, or is help needed? We have range queries in our stress testing tool now, and with Hadoop integration coming in 0.6 I expect it will get a lot more testing. Certainly anyone who wants to get their hands dirty is welcome. :) > 6) hardware... we could certainly choose to go with pretty beefy hardware, > especially in terms of RAM... is there a point where it just isn't useful? Some recommdendations in http://wiki.apache.org/cassandra/CassandraHardware. In general, don't go beyond the "knee" of the price/performance curve, since you can always add more nodes instead. Past "enough for your memtables" (http://wiki.apache.org/cassandra/MemtableSSTable), RAM is only useful for caching reads, it won't help write performance. So that's the main factor in "how much do I need." -Jonathan