Re: struggling with couchdb in production

Jurg van Vliet Tue, 26 May 2009 00:11:52 -0700


On May 26, 2009, at 12:02 AM, Nitin Borwankar wrote:

Hi guys,
Coming from a long bout of "relational database illness" (18+years) from which I rapidly recovered after the doctor orderedCouchDB, here's how I think about it. Just some very loose informalrules of thumb.
A couch db data model is a denormalized data model - so don't startwith an ER diagram and map to tables, add indexes, pr.key->f.key etc.
Normalization is an unnatural act in couchdb and documents.

It may be better to start with an object diagram and UML if you wantto go that route.
The big question is how far to go with the denormalization.
If your model is an acyclic graph you can theoretically have justone large document that is deeply nested.
But you probably will go a two or three levels deep max.

i agree with this wholeheartedly. but i would like to have some otherthechnique that helps in modeling. from what you suggest, myexperience and chris' remarks it appears as if we are all looking atsome form of maximum normal form, instead of minimum normal form. ornot? what is the the maximum you can get away with? but what is thecost of maximization?

@chris, i don't think 'update congestion' is the MAIN problem, itcertainly is one of the problems that may arise. but in the case ofusers involved i see conflicts as something they should handle,because it has meaning, and should be reacted to as such. i understandthat in a livechat a user is not interested so much in being'interrupted' all the time as she wants to say something :P

But if your model is a meshed network then you probably want to gotwo levels - e.g. take a look at the Twitter JSON reponse format andhow it embeds user info inside a status message, and in contrast howit embeds status message (last status) inside user object - in eachcase the embedded object has just a few of the attributes of theoriginal object - just enough to provide meaningful info in contextof the containing object.Instead of foreign keys use URI's - you could use namespaced URI'ssometablename.id in relational model becomes namespace:localid
Of course you can just use couchdb GUIDs if you want.

yes, i also agree with this. but i don't have a clear and cleansolution of dealing with the data replication at this level. i don'texpect couchdb replication to give a hand, it would mean sort of per-document dynamic replication strategy. (it would be nice though, onlythen i would like to have it hidden deep away in something likeactiverecord in the case of rails :))

in one solution we have implemented a two-way relationship a littlebit like this. we use couchdb keys as a reference, and as long as weknow which database the document is in they are unique. and we acceptthe cost of reading the database some more times, to get the necessaryinformation. (i am not so afraid of reading, writing is different,though.)

And finally in typical Rails-like webapps you have result sets fornavigation and browsing -
here
* "select col1, col2 where ..." corresponds to a map() function withsome logic and then emit(doc.attr1, doc.attr2) - very looselyspeaking.* "select count(col3)" and similar aggregates are achieved by havinga reduce() in addition to the map()

yes, but these 2 patterns are too limited. you still want to combinedifferent sorts of information in your database. the biggest problemin using reduce is that it can't 'undo' an emit, it can't disregard ordisqualify previously emitted rows.

i have no idea if this is something that would be helpful in couchdb,but i have found myself wishing for something like this.



Hope this helps,

yes, nitin, this certainly helps me. it helps knowing my thinking isat least in the same direction as others.


and, thank you for sharing :)

Nitin Borwankar.

(Perhaps this should be a blog post ?)




Chris Anderson wrote:
On Mon, May 25, 2009 at 12:18 PM, Jurg van Vliet <[email protected]> wrote:
guys and girls,
i am a 'real' user of couchdb, and i am having a lot of fun withit inaddition to creating real value! but it is far from easy,especially incombination with a framework that is built around relationaldatabases likerails. and still, after 4 months of intensively working withcouchdb i am
still a big fan.
but couchdb is not finished yet. and i don't mean not finished inthe senseof the software program that you can run, or the community that isbuildingthis. what i mean is that there is no documented approach to modelrealworld problems in a couchdb way. you can search but the mostinterestingexamples are to clarify the idea, or to show that it is possible.butnothing that helps me think about when to use a document, when adatabase,
when a view, etc. etc.
we have taken a couple of wrong design decisions the last coupleof months.you can call it ignorance, or hindsight, or something else. ithink it is
just the lack of a good framework for thinking couchdb.
when you make your relational database model, your tables, yourrows, yourindexes, etc. there is a large body of documentation that helpsyou approachthe problem. and even with years of practice, and people havingthe worddatabase and administrator in their jobtitle, designing yourdatabase modelsis just difficult. (there are really not many people i want tohave thinking
about tables and rows and indexes.)
so now we have to make this paradigm shift. how are WE managing tostruggle
through this?

one of my personal insights is that couchdb is so different from a
relational database that it is best approached as if it is theopposite. ina rdb you 'minimize' the entity of information, you normalizeuntil it issmall enough to still have meaning. once everything isdeconstructed you addrules (validations) your data must adhere to. having done that youstart to
put it back together using joins.
yes, there's a lot of "unlearning" that needs to be done, and thattakes time.
in couchdb this pattern doesn't work very well, at least not forus. we
learned it is easier to put as much data together in one document as
possible. my rule of thumb of when to stop is in distribution. ioften askmyself 'do i want to keep this together when i move it to anotherdatabase?'
once you have your documents views are very convenient to take your
documents apart.
My rule of thumb is that you want documents to contain their own
context. An individual document should make sense even if you don't
have any others that it may refer to.
The main pressure getting you to split data into multiple documentsis
update contention. If a lot of people are editing a list
simultaneously, then you need to make each list item it's own
document. If only one person ever edits the list, and the list is
relatively short, than putting it in one document may be easier.
a database in couchdb is the place where work comes together, inour casethis is the location where a group of people shares. combininginformationfrom different databases will be necessary. and i really have noclue yet
how to approach this problem. so anyone?
The easiest thing is to merge the databases with replication.
today i found myself in a sort of discussion with jchris and jan(i am sorryfor the other jchris' and jans, but everyone knows who i mean.)guys, what imean to say is that i am happy with your work. but your work isvery veryimportant to me. i think my work along with all the work of yourusers iswhat is going to make this movement great. if you help us succeed,you will
have what you want.
If you're interested we'll be hosting a CouchDB tutorial in London
next month: http://erlang-factory.com/conference/London2009/university/CouchDB

'scuse the plug :)
(the reason i sent it to both lists is that i think this 'couchdbway' ofworking is something that is not the problem of use ORdevelopment. it isnecessary to make everyone work together and find out wherecouchdb's future
lies.)

groet,
jurg.

Re: struggling with couchdb in production

Reply via email to