Hi Michael,

> The trivial fix is to use DOCID/REVISIONID as DOC_KEY.

Yes that’s definitely one way to address storage of edit conflicts. I think 
there are other, more compact representations that we can explore if we have 
this “exploded” data model where each scalar value maps to an individual KV 
pair. E.g. if you have two large revisions of a document that differ in only 
one field it is possible to write down a model where both revisions share all 
the rest of the KV pairs, and there’s a special flag in the value of the 
conflicted path which indicate that an edit branch occurred here. I guess we’ll 

> I'm assuming the process will flatten the key paths of the document into an 
> array and then request the value of each key as multiple parallel queries 
> against FDB at once

Ah, I think this is not one of Ilya’s assumptions. He’s trying to design a 
model which allows the retrieval of a document with a single range read, which 
is a good goal in my opinion.

I do think a small number of parallel reads can be OK, e.g. retrieving some 
database-level mapping information in parallel to the encoded document. We 
should try to avoid serializing reads, and I think issuing a separate read for 
every field of a document would be an unnecessarily heavy load.

> Assuming it only does "prefix" and not "segment", then I don't think this 
> will help because the DOCID for each key in JSON_PATH will be different, 
> making the "prefix" to each path across different documents distinct.

I’m not sure I follow you here, or we have different understandings of the 
proposal. When I’m reading a document in this model I’m retrieving a set of 
keys that all share the same {DOCID}. Moreover, if I’ve got e.g. an array 
sitting in some deeply nested part of the document, the entire path 
doc.foo.bar.baz.myarray is common to every element of the array, so it’s 
actually quite a nice case for elision.

> I think the answer is assuming every document modification can upload in 
> multiple txns.

I would like to avoid this if possible. It adds a lot of extra complexity (the 
subspace with atomic rename dance, for example), and I think CouchDB should be 
focused on use cases that do fit within the 10MB / 5 second limit.

Cheers, Adam

Reply via email to