Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

Adam Kocoloski Thu, 28 Feb 2019 13:28:26 -0800

I’ve gone ahead and submitted an RFC for the design discussed here with a small 
modification:


https://github.com/apache/couchdb/issues/1957

Cheers, Adam

> On Feb 11, 2019, at 2:37 PM, Adam Kocoloski <[email protected]> wrote:
> 
> Agreed, I don’t have answer for this. I propose to drop the optimization for 
> now given the implementation complexity for any solution that does not cause 
> a performance degradation.
> 
> Adam
> 
>> On Feb 11, 2019, at 2:11 PM, Ilya Khlopotov <[email protected]> wrote:
>> 
>>> We could represent these using the following set of KVs:
>>> 
>>> (“foo”, “active”) = true
>>> (“foo”, “owner”) = kCONFLICT
>>> (“foo”, “owner”, “1-abc”) = “alice”
>>> (“foo”, “owner”, “1-def”) = “bob”
>> I still cannot see how we can figure out if conflict for JSON path is 
>> present without reading previous revisions. The complex way of solving the 
>> issue is to use some sort of succinct atomically updated structure which we 
>> can quickly read. The structure would have to be capable of answering the 
>> following question:
>> - what are the hashes of different revisions of a subtree for a given json 
>> path
>> 
>> 
>> 
>> On 2019/02/04 23:22:09, Adam Kocoloski <[email protected]> wrote: 
>>> I think it’s fine to start a focused discussion here as it might help 
>>> inform some of the broader debate over in that thread.
>>> 
>>> As a reminder, today CouchDB writes the entire body of each document 
>>> revision on disk as a separate blob. Edit conflicts that have common fields 
>>> between them do not share any storage on disk. The revision tree is encoded 
>>> into a compact format and a copy of it is stored directly in both the by_id 
>>> tree and the by_seq tree. Each leaf entry in the revision tree contain a 
>>> pointer to the position of the associated doc revision on disk.
>>> 
>>> As a further reminder, CouchDB 2.x clusters can generate edit conflict 
>>> revisions just from multiple clients concurrently updating the same 
>>> document in a single cluster. This won’t happen when FoundationDB is 
>>> running under the hood, but users who deploy multiple CouchDB or PouchDB 
>>> servers and replicate between them can of course still produce conflicts 
>>> just like they could in CouchDB 1.x, so we need a solution.
>>> 
>>> Let’s consider the two sub-topics separately: 1) storage of edit conflict 
>>> bodies and 2) revision trees
>>> 
>>> ## Edit Conflict Storage
>>> 
>>> The simplest possible solution would be to store each document revision 
>>> separately, like we do today. We could store document bodies with (“docid”, 
>>> “revid”) as the key prefix, and each transaction could clear the key range 
>>> associated with the base revision against which the edit is being 
>>> attempted. This would work, but I think we can try to be a bit more clever 
>>> and save on storage space given that we’re splitting JSON documents into 
>>> multiple KV pairs.
>>> 
>>> One thought I’d had is to introduce a special enum Value which indicates 
>>> that the subtree “beneath” the given Key is in conflict. For example, 
>>> consider the documents
>>> 
>>> {
>>>   “_id”: “foo”,
>>>   “_rev”: “1-abc”,
>>>   “owner”: “alice”,
>>>   “active”: true
>>> }
>>> 
>>> and 
>>> 
>>> {
>>>   “_id”: “foo”,
>>>   “_rev”: “1-def”,
>>>   “owner”: “bob”,
>>>   “active”: true
>>> }
>>> 
>>> We could represent these using the following set of KVs:
>>> 
>>> (“foo”, “active”) = true
>>> (“foo”, “owner”) = kCONFLICT
>>> (“foo”, “owner”, “1-abc”) = “alice”
>>> (“foo”, “owner”, “1-def”) = “bob”
>>> 
>>> This approach also extends to conflicts where the two versions have 
>>> different data types. Consider a more complicated example where bob dropped 
>>> the “active” field and changed the “owner” field to an object:
>>> 
>>> {
>>> “_id”: “foo”,
>>> “_rev”: “1-def”,
>>> “owner”: {
>>>   “name”: “bob”,
>>>   “email”: “[email protected]"
>>> }
>>> }
>>> 
>>> Now the set of KVs for “foo” looks like this (note that a missing field 
>>> needs to be handled explicitly):
>>> 
>>> (“foo”, “active”) = kCONFLICT
>>> (“foo”, “active”, “1-abc”) = true
>>> (“foo”, “active”, “1-def”) = kMISSING
>>> (“foo”, “owner”) = kCONFLICT
>>> (“foo”, “owner”, “1-abc”) = “alice”
>>> (“foo”, “owner”, “1-def”, “name”) = “bob”
>>> (“foo”, “owner”, “1-def”, “email”) = “[email protected]”
>>> 
>>> I like this approach for the common case where documents share most of 
>>> their data in common but have a conflict in a very specific field or set of 
>>> fields. 
>>> 
>>> I’ve encountered one important downside, though: an edit that replicates in 
>>> and conflicts with the entire document can cause a bit of a data explosion. 
>>> Consider a case where I have 10 conflicting versions of a 100KB document, 
>>> but the conflicts are all related to a single scalar value. Now I replicate 
>>> in an empty document, and suddenly I have a kCONFLICT at the root. In this 
>>> model I now need to list out every path of every one of the 10 existing 
>>> revisions and I end up with a 1MB update. Yuck. That’s technically no worse 
>>> in the end state than the “zero sharing” case above, but one could easily 
>>> imagine overrunning the transaction size limit this way.
>>> 
>>> I suspect there’s a smart path out of this. Maybe the system detects a 
>>> “default” value for each field and uses that instead of writing out the 
>>> value for every revision in a conflicted subtree. Worth some discussion.
>>> 
>>> ## Revision Trees
>>> 
>>> In CouchDB we currently represent revisions as a hash history tree; each 
>>> revision identifier is derived from the content of the revision including 
>>> the revision identifier of its parent. Individual edit branches are bounded 
>>> in *length* (I believe the default is 1000 entries), but the number of edit 
>>> branches is technically unbounded.
>>> 
>>> The size limits in FoundationDB preclude us from storing the entire key 
>>> tree as a single value; in pathological situations the tree could exceed 
>>> 100KB. Rather, I think it would make sense to store each edit *branch* as a 
>>> separate KV. We stem the branch long before it hits the value size limit, 
>>> and in the happy case of no edit conflicts this means we store the edit 
>>> history metadata in a single KV. It also means that we can apply an 
>>> interactive edit without retrieving the entire conflicted revision tree; we 
>>> need only retrieve and modify the single branch against which the edit is 
>>> being applied. The downside is that we duplicate historical revision 
>>> identifiers shared by multiple edit branches, but I think this is a 
>>> worthwhile tradeoff.
>>> 
>>> I would furthermore try to structure the keys so that it is possible to 
>>> retrieve the “winning” revision in a single limit=1 range query. Ideally 
>>> I’d like to proide the following properties:
>>> 
>>> 1) a document read does not need to retrieve the revision tree at all, just 
>>> the winning revision identifier (which would be stored with the rest of the 
>>> doc)
>>> 2) a document update only needs to read the edit branch of the revision 
>>> tree against which the update is being applied, and it can read that branch 
>>> immediately knowing only the content of the edit that is being attempted 
>>> (i.e., it does not need to read the current version of the document itself).
>>> 
>>> So, I’d propose a separate subspace (maybe “_meta”?) for the revision 
>>> trees, with keys and values that look like
>>> 
>>> (“_meta”, DocID, IsDeleted, RevPosition, RevHash) = [ParentRev, 
>>> GrandparentRev, …]
>>> 
>>> The inclusion of IsDeleted, RevPosition and RevHash in the key should be 
>>> sufficient (with the right encoding) to create a range query that 
>>> automatically selects the “winner” according to CouchDB’s arcane rules, 
>>> which are something like
>>> 
>>> 1) deleted=false beats deleted=true
>>> 2) longer paths (i.e. higher RevPosition) beat shorter ones
>>> 3) RevHashes with larger binary values beat ones with smaller values
>>> 
>>> ===========
>>> 
>>> OK, that’s all on this topic from me for now. I think this is a 
>>> particularly exciting area where we start to see the dividends of splitting 
>>> up data into multiple KV pairs in FoundationDB :) Cheers,
>>> 
>>> Adam
>>> 
>>> 
>>>> On Feb 4, 2019, at 2:41 PM, Robert Newson <[email protected]> wrote:
>>>> 
>>>> This one is quite tightly coupled to the other thread on data model, 
>>>> should we start much conversation here before that one gets closer to a 
>>>> solution?
>>>> 
>>>> -- 
>>>> Robert Samuel Newson
>>>> [email protected]
>>>> 
>>>> On Mon, 4 Feb 2019, at 19:25, Ilya Khlopotov wrote:
>>>>> This is a beginning of a discussion thread about storage of edit 
>>>>> conflicts and everything which relates to revisions.
>>>>> 
>>>>> 
>>> 
>>> 
>

Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

Reply via email to