On Wed, Feb 25, 2009 at 08:07:39PM +1030, Antony Blakey wrote: > There's _id when it's not supplied in a PUT, but that would be supplied > by the Location header in the result. The more I think about it, the more > I like this idea. > > A lot more work on the client side though to deal with e.g. view > results, and I wonder about the subsequent loss of convenience with e.g. > curl in that context, although I must admit I'm no curl guru with > multipart mime.
Yes, it needs thinking through. But suppose the lower layers of document storage were separated off, so you just had: - append document to database - Btree indexing - map and reduce - replication and conflict resolution - compaction It would store a small amount of internal metadata (e.g. content-type, _rev, perhaps a _sha1) plus the raw document as received, as a blob. I can see immediate uses for this document store. For example, when warehousing RADIUS accounting packets, I could just store them in their raw binary form. This is not only smaller than JSON, and involves less processing, but it is a more accurate representation of what was actually received on the wire. If I were to use this approach in today's CouchDB (as stub JSON object plus an attachment), I would lose the ability to do map/reduce on the packet. (*) Of course, such a map function would have to be quite smart, as it would be parsing binary RADIUS packets to pull out the fields of interest, but there are libraries which do that; and even if not for Javascript, there is already the capability to do map/reduce processing in any other language. Then there's the issue of what map and reduce functions should output. I think it would be consistent if map functions could generate arbitrary binary data, tagged with its own content-type (e.g. so you can have a map function which converts an image/png to an image/jpeg) This complicates reduce functions, which would have to receive both docs and their corresponding mime-types (bundled together somehow, perhaps as a JSON array) Then I suppose reduce outputs (and re-reduce inputs) could also be arbitrary MIME objects. It might be convenient to use JSON for these, but there's no particular reason to enforce this. You might want to output plain-text strings from your reduce function, or Ruby Marshall objects, or whatever was convenient. With great power comes great danger of shooting yourself in the foot, and so a layer on top of this which *enforces* JSON would be a good-to-have too, and could even be the default. Regards, Brian. (*) This does suggest another approach which could give the same benefit: allow map functions to have access to the document's attachments somehow. But if this were to be bundled with the doc directly it would have to be base64 encoded, since JSON doesn't permit binary strings. And it would need to be made available on-demand somehow.
