On Feb 22, 2010, at 12:19 AM, ext Jonathan Ellis wrote: >> 2) is the row key model I suggested above the best approach in Cassandra, >> or is there something better? My testing so far has been using >> get_range_slice with a ColumnParent of just the CF and SlicePredicate >> listing the columns I want (though really I want all columns, is there a >> shorthand for that?) > > Cassandra deals fine with millions of columns per row, and allows > prefix queries on columns too. So an alternate model would be to have > userX as row key, and column keys "A:1, A:2, A:3, ..., B:1, B:2, B:3, > ...". This will be marginally faster than splitting by row, and has > the added advantage of not requiring OPP. > > You could use supercolumns here too (where the supercolumn name is the > thing type). If you always want to retrieve all things of type A at a > time per user, then that is a more natural fit. (Otherwise, the lack > of subcolumn indexing could be a performance gotcha for you: > http://issues.apache.org/jira/browse/CASSANDRA-598).
Would you say the supercolumn approach is faster than scanning rows? Any particular advantages or disadvantages to writing to a bunch of supercolumns at once (e.g. in one user row), vs. writing to a bunch of rows at once (with the same key prefix, i.e. close together in an order-preserved store)? > >> 3) schema changes (i.e. adding a new CF)... seems like currently you take >> the whole cluster down to accomplish this... is that likely to change in the >> future? > > You have to take each node down, but a rolling restart is fine. No > reason for the whole cluster to be down at once. OK, that's not a big deal. Extremely helpful... thanks for the response! Jeremey.