On 24 Feb 2009, at 13:39, Brian Candler wrote:
On Tue, Feb 24, 2009 at 09:06:09AM +0100, Patrick Antivackis wrote:
Oh and by the way, in a use case where there is only one database
and you
don't use compaction because you want to keep everything, well _rev
is a
revision that can be used to see the history of the document.
This is a good point. If you follow "accountants don't use erasers"
then you
will never compact (and maybe you want a flag which prevents
compaction).
You'd not use revisions to keep records around but proper documents.
However, you must then be prepared for your database to be a single
file
which grows without bounds. If CouchDB wants to support this model,
it would
be helpful if the data were stored in chunks which can be backed up
separately.
rsync? :)
"Compaction" for saving space could be achieved by rewriting the
database,
but keeping diffs for earlier revisions. At this point you would end
up with
something roughly like git.
On a random tangent: has anyone considered a CouchDB-like system where
documents are raw blobs, rather than JSON? ISTM that:
- it would save a lot of conversion between Erlang terms and JSON
- it would remove the second-class nature of attachments
- it would allow structured data to be stored in arbitary formats
(e.g. XML)
- it would allow map/reduce to work on binary data (e.g. use a map
function
to make thumbnails of all your jpegs)
- you could still use JSON quite happily, e.g.
function map(type, data) {
if (type == "application/json") {
doc = evalcx(data);
... continue as normal
}
}
I guess some of the APIs would become a bit more awkward though. For
example, bulk document insert would probably become MIME multipart.
In principle, I think you could get today's CouchDB as a thin layer
on top
of this. However, "attachments" do have interesting special
semantics (e.g.
deleting a document deletes all its attachments) which might need some
parent/child relationship between documents to maintain. Having that
relationship between documents in a more general form could also be
useful.
Just thinking out loud.
This is quite interesting! :) I'd like to see such a system, but I'd
also like
CouchDB not becoming an Apache-httpd style kitchen-sink for all things
HTTP. Maybe Yaws is what you're looking for?
Cheers
Jan
--