That requires you to know the timestamp, so you can't just ask for the most recent one.
Evan On Fri, Jul 3, 2009 at 6:02 PM, Jonathan Ellis<[email protected]> wrote: > get_columns_since > > On Fri, Jul 3, 2009 at 7:21 PM, Evan Weaver<[email protected]> wrote: >> This helps a lot. >> >> However, I can't find any API method that actually lets me do a >> slice query on a time-sorted column, as necessary for the second blog >> example. I get the following error on r789419: >> >> InvalidRequestException: get_slice_from requires CF indexed by name >> >> Evan >> >> On Tue, May 19, 2009 at 8:00 PM, Jonathan Ellis<[email protected]> wrote: >>> Mail storage, man, I think pretty much anything I could come up with >>> would look pretty simplistic compared to what "real" systems do in >>> that domain. :) >>> >>> But blogs, I think I can handle those. Let's make it ours multiuser >>> or there isn't enough scale to make it interesting. :) >>> >>> The interesting thing here is we want to be able to query two things >>> efficiently: >>> - the most recent posts belonging to a given blog, in reverse >>> chronological order >>> - a single post and its comments, in chronological order >>> >>> At first glance you might think we can again reasonably do this with a >>> single CF, this time a super CF: >>> >>> <ColumnFamily ColumnType="Super" ColumnSort="Time" Name="Post"/> >>> >>> The key is the blog name, the supercolumns are posts and the >>> subcolumns are comments. This would be reasonable BUT supercolumns >>> are just containers, they have no data or timestamp associated with >>> them directly (only through their subcolumns). So you cannot sort a >>> super CF by time. >>> >>> So instead what I would do would be to use two CFs: >>> >>> <ColumnFamily ColumnSort="Time" Name="Post"/> >>> <ColumnFamily ColumnSort="Time" Name="Comment"/> >>> >>> For the first, the keys used would be blog names, and the columns >>> would be the post titles and body. So to get a list of most recent >>> posts you just do a slice query. Even though Cassandra currently >>> handles large groups of columns sub-optimally, even with a blog >>> updated several times a day you'd be safe taking this approach (i.e. >>> we'll have that problem fixed before you start seeing it :). >>> >>> For the second, the keys are blog name<delimiter><post title>. The >>> columns are the comment data. You can serialize these a number of >>> ways; I would probably use title as the column name and have the value >>> be the author + body (e.g. as a json dict). Again we use the slice >>> call to get the comments in order. (We will have to manually reverse >>> what slice gives us since time sort is always reverse chronological >>> atm, but the overhead of doing this in memory will be negligible.) >>> >>> Does this help? >>> >>> -Jonathan >>> >>> On Tue, May 19, 2009 at 11:49 AM, Evan Weaver <[email protected]> wrote: >>>> Even if it's not actually in real-life use, some examples for common >>>> domains would really help clarify things. >>>> >>>> * blog >>>> * email storage >>>> * search index >>>> >>>> etc. >>>> >>>> Evan >>>> >>>> On Mon, May 18, 2009 at 8:19 PM, Jonathan Ellis <[email protected]> wrote: >>>>> Does anyone have a simple app schema they can share? >>>>> >>>>> I can't share the one for our main app. But we do need an example >>>>> here. A real one would be nice if we can find one. >>>>> >>>>> I checked App Engine. They don't have a whole lot of examples either. >>>>> They do have a really simple one: >>>>> http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html >>>>> >>>>> The most important thing in Cassandra modeling is choosing a good key, >>>>> since that is what most of your lookups will be by. Keys are also how >>>>> Cassandra scales -- Cassandra can handle effectively infinite keys >>>>> (given enough nodes obviously) but only thousands to millions of >>>>> columns per key/CF (depending on what API calls you use -- Jun is >>>>> adding one now that does not deseriailze everything in the whole CF >>>>> into memory. The rest will need to follow this model eventually too). >>>>> >>>>> For this guestbook I think the choice is obvious: use the name as the >>>>> key, and have a single simple CF for the messages. Each column will >>>>> be a message (you can even use the mandatory timestamp field as part >>>>> of your user-visible data. win!). You get the list (or page) of >>>>> users with get_key_range and then their messages with get_slice. >>>>> >>>>> <ColumnFamily ColumnSort="Name" Name="Message"/> >>>>> >>>>> Anyone got another one for pedagogical purposes? >>>>> >>>>> -Jonathan >>>>> >>>> >>>> >>>> >>>> -- >>>> Evan Weaver >>>> >>> >> >> >> >> -- >> Evan Weaver >> > -- Evan Weaver
