I’ve gone ahead and submitted an RFC for the design discussed here with a small modification:
https://github.com/apache/couchdb/issues/1957 Cheers, Adam > On Feb 11, 2019, at 2:37 PM, Adam Kocoloski <kocol...@apache.org> wrote: > > Agreed, I don’t have answer for this. I propose to drop the optimization for > now given the implementation complexity for any solution that does not cause > a performance degradation. > > Adam > >> On Feb 11, 2019, at 2:11 PM, Ilya Khlopotov <iil...@apache.org> wrote: >> >>> We could represent these using the following set of KVs: >>> >>> (“foo”, “active”) = true >>> (“foo”, “owner”) = kCONFLICT >>> (“foo”, “owner”, “1-abc”) = “alice” >>> (“foo”, “owner”, “1-def”) = “bob” >> I still cannot see how we can figure out if conflict for JSON path is >> present without reading previous revisions. The complex way of solving the >> issue is to use some sort of succinct atomically updated structure which we >> can quickly read. The structure would have to be capable of answering the >> following question: >> - what are the hashes of different revisions of a subtree for a given json >> path >> >> >> >> On 2019/02/04 23:22:09, Adam Kocoloski <kocol...@apache.org> wrote: >>> I think it’s fine to start a focused discussion here as it might help >>> inform some of the broader debate over in that thread. >>> >>> As a reminder, today CouchDB writes the entire body of each document >>> revision on disk as a separate blob. Edit conflicts that have common fields >>> between them do not share any storage on disk. The revision tree is encoded >>> into a compact format and a copy of it is stored directly in both the by_id >>> tree and the by_seq tree. Each leaf entry in the revision tree contain a >>> pointer to the position of the associated doc revision on disk. >>> >>> As a further reminder, CouchDB 2.x clusters can generate edit conflict >>> revisions just from multiple clients concurrently updating the same >>> document in a single cluster. This won’t happen when FoundationDB is >>> running under the hood, but users who deploy multiple CouchDB or PouchDB >>> servers and replicate between them can of course still produce conflicts >>> just like they could in CouchDB 1.x, so we need a solution. >>> >>> Let’s consider the two sub-topics separately: 1) storage of edit conflict >>> bodies and 2) revision trees >>> >>> ## Edit Conflict Storage >>> >>> The simplest possible solution would be to store each document revision >>> separately, like we do today. We could store document bodies with (“docid”, >>> “revid”) as the key prefix, and each transaction could clear the key range >>> associated with the base revision against which the edit is being >>> attempted. This would work, but I think we can try to be a bit more clever >>> and save on storage space given that we’re splitting JSON documents into >>> multiple KV pairs. >>> >>> One thought I’d had is to introduce a special enum Value which indicates >>> that the subtree “beneath” the given Key is in conflict. For example, >>> consider the documents >>> >>> { >>> “_id”: “foo”, >>> “_rev”: “1-abc”, >>> “owner”: “alice”, >>> “active”: true >>> } >>> >>> and >>> >>> { >>> “_id”: “foo”, >>> “_rev”: “1-def”, >>> “owner”: “bob”, >>> “active”: true >>> } >>> >>> We could represent these using the following set of KVs: >>> >>> (“foo”, “active”) = true >>> (“foo”, “owner”) = kCONFLICT >>> (“foo”, “owner”, “1-abc”) = “alice” >>> (“foo”, “owner”, “1-def”) = “bob” >>> >>> This approach also extends to conflicts where the two versions have >>> different data types. Consider a more complicated example where bob dropped >>> the “active” field and changed the “owner” field to an object: >>> >>> { >>> “_id”: “foo”, >>> “_rev”: “1-def”, >>> “owner”: { >>> “name”: “bob”, >>> “email”: “b...@example.com" >>> } >>> } >>> >>> Now the set of KVs for “foo” looks like this (note that a missing field >>> needs to be handled explicitly): >>> >>> (“foo”, “active”) = kCONFLICT >>> (“foo”, “active”, “1-abc”) = true >>> (“foo”, “active”, “1-def”) = kMISSING >>> (“foo”, “owner”) = kCONFLICT >>> (“foo”, “owner”, “1-abc”) = “alice” >>> (“foo”, “owner”, “1-def”, “name”) = “bob” >>> (“foo”, “owner”, “1-def”, “email”) = “b...@example.com” >>> >>> I like this approach for the common case where documents share most of >>> their data in common but have a conflict in a very specific field or set of >>> fields. >>> >>> I’ve encountered one important downside, though: an edit that replicates in >>> and conflicts with the entire document can cause a bit of a data explosion. >>> Consider a case where I have 10 conflicting versions of a 100KB document, >>> but the conflicts are all related to a single scalar value. Now I replicate >>> in an empty document, and suddenly I have a kCONFLICT at the root. In this >>> model I now need to list out every path of every one of the 10 existing >>> revisions and I end up with a 1MB update. Yuck. That’s technically no worse >>> in the end state than the “zero sharing” case above, but one could easily >>> imagine overrunning the transaction size limit this way. >>> >>> I suspect there’s a smart path out of this. Maybe the system detects a >>> “default” value for each field and uses that instead of writing out the >>> value for every revision in a conflicted subtree. Worth some discussion. >>> >>> ## Revision Trees >>> >>> In CouchDB we currently represent revisions as a hash history tree; each >>> revision identifier is derived from the content of the revision including >>> the revision identifier of its parent. Individual edit branches are bounded >>> in *length* (I believe the default is 1000 entries), but the number of edit >>> branches is technically unbounded. >>> >>> The size limits in FoundationDB preclude us from storing the entire key >>> tree as a single value; in pathological situations the tree could exceed >>> 100KB. Rather, I think it would make sense to store each edit *branch* as a >>> separate KV. We stem the branch long before it hits the value size limit, >>> and in the happy case of no edit conflicts this means we store the edit >>> history metadata in a single KV. It also means that we can apply an >>> interactive edit without retrieving the entire conflicted revision tree; we >>> need only retrieve and modify the single branch against which the edit is >>> being applied. The downside is that we duplicate historical revision >>> identifiers shared by multiple edit branches, but I think this is a >>> worthwhile tradeoff. >>> >>> I would furthermore try to structure the keys so that it is possible to >>> retrieve the “winning” revision in a single limit=1 range query. Ideally >>> I’d like to proide the following properties: >>> >>> 1) a document read does not need to retrieve the revision tree at all, just >>> the winning revision identifier (which would be stored with the rest of the >>> doc) >>> 2) a document update only needs to read the edit branch of the revision >>> tree against which the update is being applied, and it can read that branch >>> immediately knowing only the content of the edit that is being attempted >>> (i.e., it does not need to read the current version of the document itself). >>> >>> So, I’d propose a separate subspace (maybe “_meta”?) for the revision >>> trees, with keys and values that look like >>> >>> (“_meta”, DocID, IsDeleted, RevPosition, RevHash) = [ParentRev, >>> GrandparentRev, …] >>> >>> The inclusion of IsDeleted, RevPosition and RevHash in the key should be >>> sufficient (with the right encoding) to create a range query that >>> automatically selects the “winner” according to CouchDB’s arcane rules, >>> which are something like >>> >>> 1) deleted=false beats deleted=true >>> 2) longer paths (i.e. higher RevPosition) beat shorter ones >>> 3) RevHashes with larger binary values beat ones with smaller values >>> >>> =========== >>> >>> OK, that’s all on this topic from me for now. I think this is a >>> particularly exciting area where we start to see the dividends of splitting >>> up data into multiple KV pairs in FoundationDB :) Cheers, >>> >>> Adam >>> >>> >>>> On Feb 4, 2019, at 2:41 PM, Robert Newson <rnew...@apache.org> wrote: >>>> >>>> This one is quite tightly coupled to the other thread on data model, >>>> should we start much conversation here before that one gets closer to a >>>> solution? >>>> >>>> -- >>>> Robert Samuel Newson >>>> rnew...@apache.org >>>> >>>> On Mon, 4 Feb 2019, at 19:25, Ilya Khlopotov wrote: >>>>> This is a beginning of a discussion thread about storage of edit >>>>> conflicts and everything which relates to revisions. >>>>> >>>>> >>> >>> >