Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

Paul Davis Mon, 11 Feb 2019 09:23:16 -0800

Oh, for sure we can do that. I was just trying to think of a clever
way that replicated edits could also find their edit branch with a
single read as well instead of having to pull out the entire tree.


On Mon, Feb 11, 2019 at 9:04 AM Adam Kocoloski <[email protected]> wrote:
>
> Not sure I follow. If a transaction needs the full revision tree for a single 
> document it can retrieve that with a single range read for the (“_meta”, 
> DocID) prefix.
>
> Adam
>
> > On Feb 8, 2019, at 6:35 PM, Paul Davis <[email protected]> wrote:
> >
> > Ah, that all sounds good. The only thing I'm not initially seeing as
> > obvious is how we lookup a revision path to extend during replication
> > when the previous revision may be anywhere in the list of $revs_limit
> > revisions. Feels like there might be some sort of trickery to do that
> > efficiently. Although it may also be good enough to also just issue
> > $revs_limit lookups in parallel given that we're maxed out on either
> > $revs_limit or 2*$revs_limit if we have to check for both deleted and
> > not.
> >
> > On Fri, Feb 8, 2019 at 10:22 AM Adam Kocoloski <[email protected]> wrote:
> >>
> >> Totally unacceptable! ;)  In fact some key bits of that model got 
> >> dispersed into at least two separate emails so you’re likely not the only 
> >> one. I’ll restate here:
> >>
> >> The size limits in FoundationDB preclude us from storing the entire key 
> >> tree as a single value; in pathological situations the tree could exceed 
> >> 100KB. Rather, I think it would make sense to store each edit *branch* as 
> >> a separate KV. We stem the branch long before it hits the value size 
> >> limit, and in the happy case of no edit conflicts this means we store the 
> >> edit history metadata in a single KV. It also means that we can apply an 
> >> interactive edit without retrieving the entire conflicted revision tree; 
> >> we need only retrieve and modify the single branch against which the edit 
> >> is being applied. The downside is that we may duplicate historical 
> >> revision identifiers shared by multiple edit branches, but I think this is 
> >> a worthwhile tradeoff.
> >>
> >> I’d also ensure that a document update only needs to read the edit branch 
> >> KV against which the update is being applied, and it can read that branch 
> >> immediately knowing only the content of the edit that is being attempted 
> >> (i.e., it does not need to read the current version of the document 
> >> itself).
> >>
> >> I think we achieve these goals with a separate subspace (maybe “_meta”?) 
> >> for the revision trees, with keys and values that look like
> >>
> >> (“_meta”, DocID, NotDeleted, RevPosition, RevHash) = 
> >> (VersionstampForCurrentRev, [ParentRev, GrandparentRev, …])
> >>
> >> Some notes:
> >>
> >> - including IsDeleted ensures that we can efficiently accept the case 
> >> where we upload a new document with the same ID where all previous edit 
> >> branches have been deleted; i.e. we can construct a key selector which 
> >> automatically tells us there are no deleted=false edit branches
> >> - access to VersionstampForCurrentRev ensures we can clear the old entry 
> >> in the by_seq space during the update
> >> - i need to remind myself how we handle an edit attempt which supplies a 
> >> _rev representing a deleted leaf. Do we fail that as a conflict? That 
> >> would be the natural thing to do here, otherwise we’re forced to check 
> >> both deleted=false and deleted=true keys
> >> - the keys can be made to naturally sort so that the winning revision 
> >> sorts last, but I don’t believe that’s a requirement here like it is for 
> >> the actual document data space
> >>
> >> Cheers, Adam
> >>
> >>> On Feb 8, 2019, at 8:59 AM, Paul Davis <[email protected]> 
> >>> wrote:
> >>>
> >>>> I’m relatively happy with the revision history data model at this point.
> >>>
> >>> I forgot to make a note, but which of the various models are you
> >>> referring to by "revision history data model". There's been so many
> >>> without firm names that my brain is having a hard time parsing that
> >>> one.
> >>>
> >>> On Thu, Feb 7, 2019 at 9:35 PM Adam Kocoloski <[email protected]> wrote:
> >>>>
> >>>> Bob, Garren, Jan - heard you loud and clear, K.I.S.S. I do think it’s a 
> >>>> bit “simplistic" to exclusively choose simplicity over performance and 
> >>>> storage density. We’re (re)building a database here, one that has some 
> >>>> users with pretty demanding performance and scalability requirements. 
> >>>> And yes, we should certainly be testing and measuring. Kyle and team are 
> >>>> setting up infrastructure in IBM land to help with that now, but I also 
> >>>> believe we can design the data model and architecture with a basic 
> >>>> performance model of FoundationDB in mind:
> >>>>
> >>>> - reads cost 1ms
> >>>> - short range reads are the same cost as a single lookup
> >>>> - reads of independent parts of the keyspace can be parallelized for 
> >>>> cheap
> >>>> - writes are zero-cost until commit time
> >>>>
> >>>> We ought to be able to use these assumptions to drive some decisions 
> >>>> about data models ahead of any end-to-end performance test.
> >>>>
> >>>> If there are specific elements of the edit conflicts management where 
> >>>> you think greater simplicity is warranted, let’s get those called out. 
> >>>> Ilya noted (correctly, in my opinion) that the term sharing stuff is one 
> >>>> of those items. It’s relatively complex, potentially a performance hit, 
> >>>> and only saves on storage density in the corner case of lots of edit 
> >>>> conflicts. That’s a good one to drop.
> >>>>
> >>>> I’m relatively happy with the revision history data model at this point. 
> >>>> Hopefully folks find it easy to grok, and it’s efficient for both reads 
> >>>> and writes. It costs some extra storage for conflict revisions compared 
> >>>> to the current tree representation (up to 16K per edit branch, with 
> >>>> default _revs_limit) but knowing what we know about the performance 
> >>>> death spiral for wide revision trees today I’ll happily make a storage 
> >>>> vs. performance tradeoff here :)
> >>>>
> >>>> Setting the shared term approach aside, I’ve still been mulling over the 
> >>>> key structure for the actual document data:
> >>>>
> >>>> -  I thought about trying to construct a special _conflicts subspace, 
> >>>> but I don’t like that approach because the choice of a “winning" 
> >>>> revision can flip back and forth very quickly with concurrent writers to 
> >>>> different edit branches. I think we really want to have a way for 
> >>>> revisions to naturally sort themselves so the winner is the first or 
> >>>> last revision in a list.
> >>>>
> >>>> - Assuming we’re using key paths of the form (docid, revision-ish, path, 
> >>>> to, field), the goal here is to find an efficient way to get the last 
> >>>> key with prefix “docid” (assuming winner sorts last), and then all the 
> >>>> keys that share the same (docid, revision-ish) prefix as that one. I see 
> >>>> two possible approaches so far, neither perfect:
> >>>>
> >>>> Option 1: Execute a get_key() operation with a key selector that asks 
> >>>> for the last key less than “docid\xFF” (again assuming winner sorts 
> >>>> last), and then do a get_range_startswith() request setting the 
> >>>> streaming mode to “want_all” and the prefix to the docid plus whatever 
> >>>> revision-ish we found from the get_key() request. This is two roundtrips 
> >>>> instead of one, but it always retrieves exactly the right set of keys, 
> >>>> and the second step is executed as fast as possible.
> >>>>
> >>>> Option 2: Jump straight to get_range_startswith() request using only 
> >>>> “docid” as the prefix, then cancel the iteration once we reach a 
> >>>> revision not equal to the first one we see. We might transfer too much 
> >>>> data, or we might end up doing multiple roundtrips if the default 
> >>>> “iterator” streaming mode sends too little data to start (I haven’t 
> >>>> checked what the default iteration block is there), but in the typical 
> >>>> case of zero edit conflicts we have a good chance of retrieving the full 
> >>>> document in one roundtrip.
> >>>>
> >>>> I don’t have a good sense of which option wins out here from a 
> >>>> performance perspective, but they’re both operating on the same data 
> >>>> model so easy enough to test the alternatives. The important bit is 
> >>>> getting the revision-ish things to sort correctly. I think we can do 
> >>>> that by generating something like
> >>>>
> >>>> revision-ish = NotDeleted/1bit : RevPos : RevHash
> >>>>
> >>>> with some suitable order-preserving encoding on the RevPos integer.
> >>>>
> >>>> Apologies for the long email. Happy for any comments, either here or 
> >>>> over on IRC. Cheers,
> >>>>
> >>>> Adam
> >>>>
> >>>>> On Feb 7, 2019, at 4:52 PM, Robert Newson <[email protected]> wrote:
> >>>>>
> >>>>> I think we should choose simple. We can then see if performance is too 
> >>>>> low or storage overhead too high and then see what we can do about it.
> >>>>>
> >>>>> B.
> >>>>>
> >>>>> --
> >>>>> Robert Samuel Newson
> >>>>> [email protected]
> >>>>>
> >>>>> On Thu, 7 Feb 2019, at 20:36, Ilya Khlopotov wrote:
> >>>>>> We cannot do simple thing if we want to support sharing of JSON terms. 
> >>>>>> I
> >>>>>> think if we want the simplest path we should move sharing out of the
> >>>>>> scope. The problem with sharing is we need to know the location of
> >>>>>> shared terms when we do write. This means that we have to read full
> >>>>>> document on every write. There might be tricks to replace full document
> >>>>>> read with some sort of hierarchical signature or sketch of a document.
> >>>>>> However these tricks do not fall into simplest solution category. We
> >>>>>> need to choose the design goals:
> >>>>>> - simple
> >>>>>> - performance
> >>>>>> - reduced storage overhead
> >>>>>>
> >>>>>> best regards,
> >>>>>> iilyak
> >>>>>>
> >>>>>> On 2019/02/07 12:45:34, Garren Smith <[email protected]> wrote:
> >>>>>>> I’m also in favor of keeping it really simple and then testing and
> >>>>>>> measuring it.
> >>>>>>>
> >>>>>>> What is the best way to measure that we have something that works? 
> >>>>>>> I’m not
> >>>>>>> sure just relying on our current tests will prove that? Should we 
> >>>>>>> define
> >>>>>>> and build some more complex situations e.g docs with lots of 
> >>>>>>> conflicts or
> >>>>>>> docs with wide revisions and make sure we can solve for those?
> >>>>>>>
> >>>>>>> On Thu, Feb 7, 2019 at 12:33 PM Jan Lehnardt <[email protected]> wrote:
> >>>>>>>
> >>>>>>>> I’m also very much in favour with starting with the simplest thing 
> >>>>>>>> that
> >>>>>>>> can possibly work and doesn’t go against the advertised best 
> >>>>>>>> practices of
> >>>>>>>> FoundationDB. Let’s get that going and get a feel for how it all 
> >>>>>>>> works
> >>>>>>>> together, before trying to optimise things we can’t measure yet.
> >>>>>>>>
> >>>>>>>> Best
> >>>>>>>> Jan
> >>>>>>>> —
> >>>>>>>>
> >>>>>>>>> On 6. Feb 2019, at 16:58, Robert Samuel Newson <[email protected]>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> With the Redwood storage engine under development and with prefix
> >>>>>>>> elision part of its design, I don’t think we should get too hung up 
> >>>>>>>> on
> >>>>>>>> adding complications and indirections in the key space just yet. We 
> >>>>>>>> haven’t
> >>>>>>>> written a line of code or run any tests, this is premature 
> >>>>>>>> optimisation.
> >>>>>>>>>
> >>>>>>>>> I’d like to focus on the simplest solution that yields all required
> >>>>>>>> properties. We can embellish later (if warranted).
> >>>>>>>>>
> >>>>>>>>> I am intrigued by all the ideas that might allow us cheaper inserts 
> >>>>>>>>> and
> >>>>>>>> updates than the current code where there are multiple edit branches 
> >>>>>>>> in the
> >>>>>>>> stored document.
> >>>>>>>>>
> >>>>>>>>> B.
> >>>>>>>>>
> >>>>>>>>>> On 6 Feb 2019, at 02:18, Ilya Khlopotov <[email protected]> wrote:
> >>>>>>>>>>
> >>>>>>>>>> While reading Adam's proposal I came to realize that: we don't 
> >>>>>>>>>> have to
> >>>>>>>> calculate winning revision at read time.
> >>>>>>>>>> Since FDB's transactions are atomic we can calculate it when we 
> >>>>>>>>>> write.
> >>>>>>>> This means we can just write latest values into separate range. This 
> >>>>>>>> makes
> >>>>>>>> lookup of latest version fast.
> >>>>>>>>>> Another realization is if we want to share values for some json 
> >>>>>>>>>> paths
> >>>>>>>> we would have to introduce a level of indirection.
> >>>>>>>>>> Bellow is the data model inspired by Adam's idea to share 
> >>>>>>>>>> json_paths.
> >>>>>>>> In this model the json_path is stored in the revision where it was 
> >>>>>>>> first
> >>>>>>>> added (we call that revision an owner of a json_path). The values for
> >>>>>>>> json_path key can be scalar values, parts of scalar values or 
> >>>>>>>> pointers to
> >>>>>>>> owner location.
> >>>>>>>>>> The below snippets are sketches of transactions.
> >>>>>>>>>> The transactions will include updates to other keys as needed
> >>>>>>>> (`external_size`, `by_seq` and so on).  The revision tree management 
> >>>>>>>> is not
> >>>>>>>> covered yet.
> >>>>>>>>>> The `rev -> vsn` indirection is not strictly required. It is added
> >>>>>>>> because it saves some space since `rev` is a long string and `vsn` 
> >>>>>>>> is FDB
> >>>>>>>> versionstamp of fixed size.
> >>>>>>>>>>
> >>>>>>>>>> - `{NS} / {docid} / _by_rev / {rev} = vsn`
> >>>>>>>>>> - `{NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL`
> >>>>>>>>>> - `{NS} / {docid} / _data / {json_path} = latest_value | part`
> >>>>>>>>>> - `{NS} / {docid} / {vsn} / _data / {json_path} = value | part |
> >>>>>>>> {another_vsn}`
> >>>>>>>>>>
> >>>>>>>>>> ```
> >>>>>>>>>> write(txn, doc_id, prev_rev, json):
> >>>>>>>>>> txn.add_write_conflict_key("{NS} / {doc_id} / _rev")
> >>>>>>>>>> rev = generate_new_rev()
> >>>>>>>>>> txn["{NS} / {docid} / _by_rev / {rev}"] = vsn
> >>>>>>>>>> for every json_path in flattened json
> >>>>>>>>>> - {NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL
> >>>>>>>>>> if rev is HEAD:
> >>>>>>>>>>   # this range contains values for all json paths for the latest
> >>>>>>>> revision (read optimization)
> >>>>>>>>>>   - {NS} / {docid} / _data / {json_path} = latest_value | part
> >>>>>>>>>> - {NS} / {docid} / {vsn} / _data / {json_path} = value | part |
> >>>>>>>> {another_vsn}
> >>>>>>>>>> txn["{NS} / {doc_id} / _rev"] = rev
> >>>>>>>>>>
> >>>>>>>>>> get_current(txn, doc_id):
> >>>>>>>>>> # there is no sharing of json_paths in this range (read 
> >>>>>>>>>> optimization)
> >>>>>>>>>> txn.get_range("{NS} / {docid} / _data / 0x00", "{NS} / {docid} / 
> >>>>>>>>>> _data
> >>>>>>>> / 0xFF" )
> >>>>>>>>>>
> >>>>>>>>>> get_revision(txn, doc_id, rev):
> >>>>>>>>>> vsn = txn["{NS} / {docid} / _by_rev / {rev}"]
> >>>>>>>>>> json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00",
> >>>>>>>> "{NS} / {vsn} / {docid} / _data / 0xFF" )
> >>>>>>>>>> for every json_path in json_paths:
> >>>>>>>>>> if value has type vsn:
> >>>>>>>>>>   another_vsn = value
> >>>>>>>>>>      value = txn["{NS} / {docid} / {another_vsn} / _data /
> >>>>>>>> {json_path}"]
> >>>>>>>>>> result[json_path] = value
> >>>>>>>>>>
> >>>>>>>>>> delete_revision(txn, doc_id, rev):
> >>>>>>>>>> vsn = txn["{NS} / {docid} / _by_rev / {rev}"]
> >>>>>>>>>> json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00",
> >>>>>>>> "{NS} / {vsn} / {docid} / _data / 0xFF" )
> >>>>>>>>>> for every json_path in json_paths:
> >>>>>>>>>> if value has type vsn:
> >>>>>>>>>>  # remove reference to deleted revision from the owner
> >>>>>>>>>>   del txn[{NS} / {docid} / _used_by / {json_path} / {vsn}]
> >>>>>>>>>> # check if deleted revision of json_path is not used by anything 
> >>>>>>>>>> else
> >>>>>>>>>> if txn.get_range("{NS} / {docid} / _used_by / {json_path} / {vsn}",
> >>>>>>>> limit=1) == []:
> >>>>>>>>>>   del txn["{NS} / {docid} / {vsn} / _data / {json_path}"]
> >>>>>>>>>> if vsn is HEAD:
> >>>>>>>>>>   copy range for winning revision into "{NS} / {docid} / _data /
> >>>>>>>> {json_path}"
> >>>>>>>>>> ```
> >>>>>>>>>>
> >>>>>>>>>> best regards,
> >>>>>>>>>> iilyak
> >>>>>>>>>>
> >>>>>>>>>> On 2019/02/04 23:22:09, Adam Kocoloski <[email protected]> wrote:
> >>>>>>>>>>> I think it’s fine to start a focused discussion here as it might 
> >>>>>>>>>>> help
> >>>>>>>> inform some of the broader debate over in that thread.
> >>>>>>>>>>>
> >>>>>>>>>>> As a reminder, today CouchDB writes the entire body of each 
> >>>>>>>>>>> document
> >>>>>>>> revision on disk as a separate blob. Edit conflicts that have common 
> >>>>>>>> fields
> >>>>>>>> between them do not share any storage on disk. The revision tree is 
> >>>>>>>> encoded
> >>>>>>>> into a compact format and a copy of it is stored directly in both 
> >>>>>>>> the by_id
> >>>>>>>> tree and the by_seq tree. Each leaf entry in the revision tree 
> >>>>>>>> contain a
> >>>>>>>> pointer to the position of the associated doc revision on disk.
> >>>>>>>>>>>
> >>>>>>>>>>> As a further reminder, CouchDB 2.x clusters can generate edit 
> >>>>>>>>>>> conflict
> >>>>>>>> revisions just from multiple clients concurrently updating the same
> >>>>>>>> document in a single cluster. This won’t happen when FoundationDB is
> >>>>>>>> running under the hood, but users who deploy multiple CouchDB or 
> >>>>>>>> PouchDB
> >>>>>>>> servers and replicate between them can of course still produce 
> >>>>>>>> conflicts
> >>>>>>>> just like they could in CouchDB 1.x, so we need a solution.
> >>>>>>>>>>>
> >>>>>>>>>>> Let’s consider the two sub-topics separately: 1) storage of edit
> >>>>>>>> conflict bodies and 2) revision trees
> >>>>>>>>>>>
> >>>>>>>>>>> ## Edit Conflict Storage
> >>>>>>>>>>>
> >>>>>>>>>>> The simplest possible solution would be to store each document
> >>>>>>>> revision separately, like we do today. We could store document 
> >>>>>>>> bodies with
> >>>>>>>> (“docid”, “revid”) as the key prefix, and each transaction could 
> >>>>>>>> clear the
> >>>>>>>> key range associated with the base revision against which the edit 
> >>>>>>>> is being
> >>>>>>>> attempted. This would work, but I think we can try to be a bit more 
> >>>>>>>> clever
> >>>>>>>> and save on storage space given that we’re splitting JSON documents 
> >>>>>>>> into
> >>>>>>>> multiple KV pairs.
> >>>>>>>>>>>
> >>>>>>>>>>> One thought I’d had is to introduce a special enum Value which
> >>>>>>>> indicates that the subtree “beneath” the given Key is in conflict. 
> >>>>>>>> For
> >>>>>>>> example, consider the documents
> >>>>>>>>>>>
> >>>>>>>>>>> {
> >>>>>>>>>>> “_id”: “foo”,
> >>>>>>>>>>> “_rev”: “1-abc”,
> >>>>>>>>>>> “owner”: “alice”,
> >>>>>>>>>>> “active”: true
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>> and
> >>>>>>>>>>>
> >>>>>>>>>>> {
> >>>>>>>>>>> “_id”: “foo”,
> >>>>>>>>>>> “_rev”: “1-def”,
> >>>>>>>>>>> “owner”: “bob”,
> >>>>>>>>>>> “active”: true
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>> We could represent these using the following set of KVs:
> >>>>>>>>>>>
> >>>>>>>>>>> (“foo”, “active”) = true
> >>>>>>>>>>> (“foo”, “owner”) = kCONFLICT
> >>>>>>>>>>> (“foo”, “owner”, “1-abc”) = “alice”
> >>>>>>>>>>> (“foo”, “owner”, “1-def”) = “bob”
> >>>>>>>>>>>
> >>>>>>>>>>> This approach also extends to conflicts where the two versions 
> >>>>>>>>>>> have
> >>>>>>>> different data types. Consider a more complicated example where bob 
> >>>>>>>> dropped
> >>>>>>>> the “active” field and changed the “owner” field to an object:
> >>>>>>>>>>>
> >>>>>>>>>>> {
> >>>>>>>>>>> “_id”: “foo”,
> >>>>>>>>>>> “_rev”: “1-def”,
> >>>>>>>>>>> “owner”: {
> >>>>>>>>>>> “name”: “bob”,
> >>>>>>>>>>> “email”: “[email protected]"
> >>>>>>>>>>> }
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>> Now the set of KVs for “foo” looks like this (note that a missing
> >>>>>>>> field needs to be handled explicitly):
> >>>>>>>>>>>
> >>>>>>>>>>> (“foo”, “active”) = kCONFLICT
> >>>>>>>>>>> (“foo”, “active”, “1-abc”) = true
> >>>>>>>>>>> (“foo”, “active”, “1-def”) = kMISSING
> >>>>>>>>>>> (“foo”, “owner”) = kCONFLICT
> >>>>>>>>>>> (“foo”, “owner”, “1-abc”) = “alice”
> >>>>>>>>>>> (“foo”, “owner”, “1-def”, “name”) = “bob”
> >>>>>>>>>>> (“foo”, “owner”, “1-def”, “email”) = “[email protected]”
> >>>>>>>>>>>
> >>>>>>>>>>> I like this approach for the common case where documents share 
> >>>>>>>>>>> most of
> >>>>>>>> their data in common but have a conflict in a very specific field or 
> >>>>>>>> set of
> >>>>>>>> fields.
> >>>>>>>>>>>
> >>>>>>>>>>> I’ve encountered one important downside, though: an edit that
> >>>>>>>> replicates in and conflicts with the entire document can cause a bit 
> >>>>>>>> of a
> >>>>>>>> data explosion. Consider a case where I have 10 conflicting versions 
> >>>>>>>> of a
> >>>>>>>> 100KB document, but the conflicts are all related to a single scalar 
> >>>>>>>> value.
> >>>>>>>> Now I replicate in an empty document, and suddenly I have a 
> >>>>>>>> kCONFLICT at
> >>>>>>>> the root. In this model I now need to list out every path of every 
> >>>>>>>> one of
> >>>>>>>> the 10 existing revisions and I end up with a 1MB update. Yuck. 
> >>>>>>>> That’s
> >>>>>>>> technically no worse in the end state than the “zero sharing” case 
> >>>>>>>> above,
> >>>>>>>> but one could easily imagine overrunning the transaction size limit 
> >>>>>>>> this
> >>>>>>>> way.
> >>>>>>>>>>>
> >>>>>>>>>>> I suspect there’s a smart path out of this. Maybe the system 
> >>>>>>>>>>> detects a
> >>>>>>>> “default” value for each field and uses that instead of writing out 
> >>>>>>>> the
> >>>>>>>> value for every revision in a conflicted subtree. Worth some 
> >>>>>>>> discussion.
> >>>>>>>>>>>
> >>>>>>>>>>> ## Revision Trees
> >>>>>>>>>>>
> >>>>>>>>>>> In CouchDB we currently represent revisions as a hash history 
> >>>>>>>>>>> tree;
> >>>>>>>> each revision identifier is derived from the content of the revision
> >>>>>>>> including the revision identifier of its parent. Individual edit 
> >>>>>>>> branches
> >>>>>>>> are bounded in *length* (I believe the default is 1000 entries), but 
> >>>>>>>> the
> >>>>>>>> number of edit branches is technically unbounded.
> >>>>>>>>>>>
> >>>>>>>>>>> The size limits in FoundationDB preclude us from storing the 
> >>>>>>>>>>> entire
> >>>>>>>> key tree as a single value; in pathological situations the tree could
> >>>>>>>> exceed 100KB. Rather, I think it would make sense to store each edit
> >>>>>>>> *branch* as a separate KV. We stem the branch long before it hits 
> >>>>>>>> the value
> >>>>>>>> size limit, and in the happy case of no edit conflicts this means we 
> >>>>>>>> store
> >>>>>>>> the edit history metadata in a single KV. It also means that we can 
> >>>>>>>> apply
> >>>>>>>> an interactive edit without retrieving the entire conflicted 
> >>>>>>>> revision tree;
> >>>>>>>> we need only retrieve and modify the single branch against which the 
> >>>>>>>> edit
> >>>>>>>> is being applied. The downside is that we duplicate historical 
> >>>>>>>> revision
> >>>>>>>> identifiers shared by multiple edit branches, but I think this is a
> >>>>>>>> worthwhile tradeoff.
> >>>>>>>>>>>
> >>>>>>>>>>> I would furthermore try to structure the keys so that it is 
> >>>>>>>>>>> possible
> >>>>>>>> to retrieve the “winning” revision in a single limit=1 range query. 
> >>>>>>>> Ideally
> >>>>>>>> I’d like to proide the following properties:
> >>>>>>>>>>>
> >>>>>>>>>>> 1) a document read does not need to retrieve the revision tree at 
> >>>>>>>>>>> all,
> >>>>>>>> just the winning revision identifier (which would be stored with the 
> >>>>>>>> rest
> >>>>>>>> of the doc)
> >>>>>>>>>>> 2) a document update only needs to read the edit branch of the
> >>>>>>>> revision tree against which the update is being applied, and it can 
> >>>>>>>> read
> >>>>>>>> that branch immediately knowing only the content of the edit that is 
> >>>>>>>> being
> >>>>>>>> attempted (i.e., it does not need to read the current version of the
> >>>>>>>> document itself).
> >>>>>>>>>>>
> >>>>>>>>>>> So, I’d propose a separate subspace (maybe “_meta”?) for the 
> >>>>>>>>>>> revision
> >>>>>>>> trees, with keys and values that look like
> >>>>>>>>>>>
> >>>>>>>>>>> (“_meta”, DocID, IsDeleted, RevPosition, RevHash) = [ParentRev,
> >>>>>>>> GrandparentRev, …]
> >>>>>>>>>>>
> >>>>>>>>>>> The inclusion of IsDeleted, RevPosition and RevHash in the key 
> >>>>>>>>>>> should
> >>>>>>>> be sufficient (with the right encoding) to create a range query that
> >>>>>>>> automatically selects the “winner” according to CouchDB’s arcane 
> >>>>>>>> rules,
> >>>>>>>> which are something like
> >>>>>>>>>>>
> >>>>>>>>>>> 1) deleted=false beats deleted=true
> >>>>>>>>>>> 2) longer paths (i.e. higher RevPosition) beat shorter ones
> >>>>>>>>>>> 3) RevHashes with larger binary values beat ones with smaller 
> >>>>>>>>>>> values
> >>>>>>>>>>>
> >>>>>>>>>>> ===========
> >>>>>>>>>>>
> >>>>>>>>>>> OK, that’s all on this topic from me for now. I think this is a
> >>>>>>>> particularly exciting area where we start to see the dividends of 
> >>>>>>>> splitting
> >>>>>>>> up data into multiple KV pairs in FoundationDB :) Cheers,
> >>>>>>>>>>>
> >>>>>>>>>>> Adam
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On Feb 4, 2019, at 2:41 PM, Robert Newson <[email protected]> 
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> This one is quite tightly coupled to the other thread on data 
> >>>>>>>>>>>> model,
> >>>>>>>> should we start much conversation here before that one gets closer 
> >>>>>>>> to a
> >>>>>>>> solution?
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Robert Samuel Newson
> >>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, 4 Feb 2019, at 19:25, Ilya Khlopotov wrote:
> >>>>>>>>>>>>> This is a beginning of a discussion thread about storage of edit
> >>>>>>>>>>>>> conflicts and everything which relates to revisions.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Professional Support for Apache CouchDB:
> >>>>>>>> https://neighbourhood.ie/couchdb-support/
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>
>

Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

Reply via email to