Re: Visual representation of Cassandra data model

Colin Mollenhour Thu, 13 Aug 2009 10:16:44 -0700

I'm really glad that you all are working on this, Cassandra's data modelto me was still is a big learning curve to completely digest due to thevarious unknown implications (to Cassandra newbies especially) that thedata model has on performance and usability. This also seems to changingsomewhat with the Thrift API changes so it would be really nice to havea "designing a Cassandra schema for your application" guide.

In your model I don't think it is best to have a general "map" SC withall of the relations in it since there will be unnecessarydeserialization and network transfer of the map data that you won'talways make use of. I think you should denormalize and use separate CFsfor the various mappings. Cassandra handles lots of keys better thanlarge SCs from what I understand. Here is my first stab at the datamodel you are working on:


Schema Legend:
<CF or SC name> (SC|CF keyed on <key description>)
<example key>: {<column name>: <value>, ...}
or

<example key>: [<CF name>: {<column name>: <value>, ...}, <CF name>:{...}, ...]


Delicious Keyspace Schema:
user (CF keyed on nick)
"mccv": {name: "Mark McBride", email: "[email protected]"}

bookmark (SC keyed on url with CFs for related users and related tags)

"http://thesartorialist.blogspot.com": [details: {title: "TheSartorialist", other_meta_data: <value>}, users: {"mccv": null}, tags:{"blog": null, "news": null}](storing users here may be overkill, but it is reasonable that whenretrieving a bookmark you will usually want the tags too)

bookmark_tag_users (CF keyed on bookmark|tag containing list of relatedusers)

"http://thesartorialist.blogspot.com|blog": {"mvcc": null, ...}
"http://thesartorialist.blogspot.com|news": {"mvcc": null, ...}

user_bookmark_tags (CF keyed on user|bookmark to lookup a user's tagsfor a bookmark or all of a user's bookmarks and their tags (usingkey_range))"mccv|http://thesartorialist.blogspot.com": {"blog": null, "news": null,...}


tag_bookmarks (CF keyed on tag name to lookup all bookmarks for a given tag)
"blog": {"http://thesartorialist.blogspot.com": "The Sartorialist", ...}
"news": {"http://thesartorialist.blogspot.com": "The Sartorialist", ...}

user_tag_bookmarks (CF keyed on tag|user to lookup all bookmarks for agiven tag and user or just a given user (using key_range))

"mccv|blog": {"http://thesartorialist.blogspot.com":"The Sartorialist", ...}
"mccv|news": {"http://thesartorialist.blogspot.com":"The Sartorialist", ...}

I think a good approach to designing a Cassandra schema from scratch isto make a list of the queries that you *know* you will need to be fastand then look at your model attempts and see how well it fits whiletrying to minimize overhead. Example:

-All bookmarks for a user
-All of a user's bookmarks for a tag
-All bookmarks for a tag
-All tags for a bookmark
-etc..

I would start with a highly denormalized schema that consists of onlysimple CFs. My take on SCs is that if you know that every time youretrieve data from one CF for a key you will also retrieve data foranother CF with the same key, then you should probably combine them in aSC, otherwise they probably need to be in a separate simple CF (due tothe entire SC having to be deserialized in memory just to retrieve aslice). However it seems like you can end up with lots of specialpurpose CFs used as maps and I'm not sure at what point you would wantto simply go with a different database system with a richer queryingcapability.. I don't know much about Delicious, but it seems that usingnatural keys is perfectly acceptable in this case.

I'm sure this isn't the best schema but it is an alternative approach.I'd really love to see how the experts would model this in a productionsystem.


Thanks,
Colin

Mark McBride wrote:

While working on an updated data model wiki page I'm trying to put
together a graphical representation of the data model.  I threw this
together based on Curt's goal of modeling delicious.  The basic gist
is descriptive data for tags, users, and bookmarks goes in the
Description column family.  The relationships between bookmarks, tags
and users goes in the map supercolumn.  I'm not sure this is how you
would do it in production (I'm guessing at the very least you'd want
separate supercolumns for bookmarks, tags and users), but it seems to
be simple enough for a new user to digest, and covers all the bases of
the data model (aside from ordering I guess).  So two questions

1) did I get it right (I'm new to this as well)?
2) is this a useful representation?

  ---Mark

------------------------------------------------------------------------

Re: Visual representation of Cassandra data model

Reply via email to