Hi guys,

Coming from a long bout of "relational database illness" (18+ years) from which I rapidly recovered after the doctor ordered CouchDB, here's how I think about it. Just some very loose informal rules of thumb.

A couch db data model is a denormalized data model - so don't start with an ER diagram and map to tables, add indexes, pr.key->f.key etc.
Normalization is an unnatural act in couchdb and documents.

It may be better to start with an object diagram and UML if you want to go that route.
The big question is how far to go with the denormalization.

If your model is an acyclic graph you can theoretically have just one large document that is deeply nested.
But you probably will go a two or three levels deep max.

But if your model is a meshed network then you probably want to go two levels - e.g. take a look at the Twitter JSON reponse format and how it embeds user info inside a status message, and in contrast how it embeds status message (last status) inside user object - in each case the embedded object has just a few of the attributes of the original object - just enough to provide meaningful info in context of the containing object. Instead of foreign keys use URI's - you could use namespaced URI's sometablename.id in relational model becomes namespace:localid
Of course you can just use couchdb GUIDs if you want.

And finally in typical Rails-like webapps you have result sets for navigation and browsing -

here
* "select col1, col2 where ..." corresponds to a map() function with some logic and then emit(doc.attr1, doc.attr2) - very loosely speaking. * "select count(col3)" and similar aggregates are achieved by having a reduce() in addition to the map()

Hope this helps,

Nitin Borwankar.

(Perhaps this should be a blog post ?)




Chris Anderson wrote:
On Mon, May 25, 2009 at 12:18 PM, Jurg van Vliet <[email protected]> wrote:
guys and girls,

i am a 'real' user of couchdb, and i am having a lot of fun with it in
addition to creating real value! but it is far from easy, especially in
combination with a framework that is built around relational databases like
rails. and still, after 4 months of intensively working with couchdb i am
still a big fan.

but couchdb is not finished yet. and i don't mean not finished in the sense
of the software program that you can run, or the community that is building
this. what i mean is that there is no documented approach to model real
world problems in a couchdb way. you can search but the most interesting
examples are to clarify the idea, or to show that it is possible. but
nothing that helps me think about when to use a document, when a database,
when a view, etc. etc.

we have taken a couple of wrong design decisions the last couple of months.
you can call it ignorance, or hindsight, or something else. i think it is
just the lack of a good framework for thinking couchdb.

when you make your relational database model, your tables, your rows, your
indexes, etc. there is a large body of documentation that helps you approach
the problem. and even with years of practice, and people having the word
database and administrator in their jobtitle, designing your database models
is just difficult. (there are really not many people i want to have thinking
about tables and rows and indexes.)

so now we have to make this paradigm shift. how are WE managing to struggle
through this?

one of my personal insights is that couchdb is so different from a
relational database that it is best approached as if it is the opposite. in
a rdb you 'minimize' the entity of information, you normalize until it is
small enough to still have meaning. once everything is deconstructed you add
rules (validations) your data must adhere to. having done that you start to
put it back together using joins.

 yes, there's a lot of "unlearning" that needs to be done, and that takes time.

in couchdb this pattern doesn't work very well, at least not for us. we
learned it is easier to put as much data together in one document as
possible. my rule of thumb of when to stop is in distribution. i often ask
myself 'do i want to keep this together when i move it to another database?'
once you have your documents views are very convenient to take your
documents apart.

My rule of thumb is that you want documents to contain their own
context. An individual document should make sense even if you don't
have any others that it may refer to.

The main pressure getting you to split data into multiple documents is
update contention. If a lot of people are editing a list
simultaneously, then you need to make each list item it's own
document. If only one person ever edits the list, and the list is
relatively short, than putting it in one document may be easier.

a database in couchdb is the place where work comes together, in our case
this is the location where a group of people shares. combining information
from different databases will be necessary. and i really have no clue yet
how to approach this problem. so anyone?

The easiest thing is to merge the databases with replication.

today i found myself in a sort of discussion with jchris and jan (i am sorry
for the other jchris' and jans, but everyone knows who i mean.) guys, what i
mean to say is that i am happy with your work. but your work is very very
important to me. i think my work along with all the work of your users is
what is going to make this movement great. if you help us succeed, you will
have what you want.

If you're interested we'll be hosting a CouchDB tutorial in London
next month: http://erlang-factory.com/conference/London2009/university/CouchDB

'scuse the plug :)

(the reason i sent it to both lists is that i think this 'couchdb way' of
working is something that is not the problem of use OR development. it is
necessary to make everyone work together and find out where couchdb's future
lies.)

groet,
jurg.





Reply via email to