Re: Fail on a simple case on replication

Brian Candler Thu, 26 Feb 2009 00:42:38 -0800

On Wed, Feb 25, 2009 at 08:07:39PM +1030, Antony Blakey wrote:
> There's _id when it's not supplied in a PUT, but that would be supplied 
> by the Location header in the result. The more I think about it, the more 
> I like this idea.
>
> A lot more work on the client side though to deal with e.g. view  
> results, and I wonder about the subsequent loss of convenience with e.g. 
> curl in that context, although I must admit I'm no curl guru with  
> multipart mime.


Yes, it needs thinking through. But suppose the lower layers of document
storage were separated off, so you just had:
- append document to database
- Btree indexing
- map and reduce
- replication and conflict resolution
- compaction

It would store a small amount of internal metadata (e.g. content-type, _rev,
perhaps a _sha1) plus the raw document as received, as a blob.

I can see immediate uses for this document store. For example, when
warehousing RADIUS accounting packets, I could just store them in their raw
binary form. This is not only smaller than JSON, and involves less
processing, but it is a more accurate representation of what was actually
received on the wire.

If I were to use this approach in today's CouchDB (as stub JSON object plus
an attachment), I would lose the ability to do map/reduce on the packet. (*)

Of course, such a map function would have to be quite smart, as it would be
parsing binary RADIUS packets to pull out the fields of interest, but there
are libraries which do that; and even if not for Javascript, there is
already the capability to do map/reduce processing in any other language.

Then there's the issue of what map and reduce functions should output.

I think it would be consistent if map functions could generate arbitrary
binary data, tagged with its own content-type (e.g. so you can have a map
function which converts an image/png to an image/jpeg)

This complicates reduce functions, which would have to receive both docs and
their corresponding mime-types (bundled together somehow, perhaps as a JSON
array)

Then I suppose reduce outputs (and re-reduce inputs) could also be arbitrary
MIME objects. It might be convenient to use JSON for these, but there's no
particular reason to enforce this. You might want to output plain-text
strings from your reduce function, or Ruby Marshall objects, or whatever was
convenient.

With great power comes great danger of shooting yourself in the foot, and so
a layer on top of this which *enforces* JSON would be a good-to-have too,
and could even be the default.

Regards,

Brian.

(*) This does suggest another approach which could give the same benefit:
allow map functions to have access to the document's attachments somehow.
But if this were to be bundled with the doc directly it would have to be
base64 encoded, since JSON doesn't permit binary strings. And it would need
to be made available on-demand somehow.

Re: Fail on a simple case on replication

Reply via email to