Oh, for sure we can do that. I was just trying to think of a clever way that replicated edits could also find their edit branch with a single read as well instead of having to pull out the entire tree.
On Mon, Feb 11, 2019 at 9:04 AM Adam Kocoloski <kocol...@apache.org> wrote: > > Not sure I follow. If a transaction needs the full revision tree for a single > document it can retrieve that with a single range read for the (“_meta”, > DocID) prefix. > > Adam > > > On Feb 8, 2019, at 6:35 PM, Paul Davis <paul.joseph.da...@gmail.com> wrote: > > > > Ah, that all sounds good. The only thing I'm not initially seeing as > > obvious is how we lookup a revision path to extend during replication > > when the previous revision may be anywhere in the list of $revs_limit > > revisions. Feels like there might be some sort of trickery to do that > > efficiently. Although it may also be good enough to also just issue > > $revs_limit lookups in parallel given that we're maxed out on either > > $revs_limit or 2*$revs_limit if we have to check for both deleted and > > not. > > > > On Fri, Feb 8, 2019 at 10:22 AM Adam Kocoloski <kocol...@apache.org> wrote: > >> > >> Totally unacceptable! ;) In fact some key bits of that model got > >> dispersed into at least two separate emails so you’re likely not the only > >> one. I’ll restate here: > >> > >> The size limits in FoundationDB preclude us from storing the entire key > >> tree as a single value; in pathological situations the tree could exceed > >> 100KB. Rather, I think it would make sense to store each edit *branch* as > >> a separate KV. We stem the branch long before it hits the value size > >> limit, and in the happy case of no edit conflicts this means we store the > >> edit history metadata in a single KV. It also means that we can apply an > >> interactive edit without retrieving the entire conflicted revision tree; > >> we need only retrieve and modify the single branch against which the edit > >> is being applied. The downside is that we may duplicate historical > >> revision identifiers shared by multiple edit branches, but I think this is > >> a worthwhile tradeoff. > >> > >> I’d also ensure that a document update only needs to read the edit branch > >> KV against which the update is being applied, and it can read that branch > >> immediately knowing only the content of the edit that is being attempted > >> (i.e., it does not need to read the current version of the document > >> itself). > >> > >> I think we achieve these goals with a separate subspace (maybe “_meta”?) > >> for the revision trees, with keys and values that look like > >> > >> (“_meta”, DocID, NotDeleted, RevPosition, RevHash) = > >> (VersionstampForCurrentRev, [ParentRev, GrandparentRev, …]) > >> > >> Some notes: > >> > >> - including IsDeleted ensures that we can efficiently accept the case > >> where we upload a new document with the same ID where all previous edit > >> branches have been deleted; i.e. we can construct a key selector which > >> automatically tells us there are no deleted=false edit branches > >> - access to VersionstampForCurrentRev ensures we can clear the old entry > >> in the by_seq space during the update > >> - i need to remind myself how we handle an edit attempt which supplies a > >> _rev representing a deleted leaf. Do we fail that as a conflict? That > >> would be the natural thing to do here, otherwise we’re forced to check > >> both deleted=false and deleted=true keys > >> - the keys can be made to naturally sort so that the winning revision > >> sorts last, but I don’t believe that’s a requirement here like it is for > >> the actual document data space > >> > >> Cheers, Adam > >> > >>> On Feb 8, 2019, at 8:59 AM, Paul Davis <paul.joseph.da...@gmail.com> > >>> wrote: > >>> > >>>> I’m relatively happy with the revision history data model at this point. > >>> > >>> I forgot to make a note, but which of the various models are you > >>> referring to by "revision history data model". There's been so many > >>> without firm names that my brain is having a hard time parsing that > >>> one. > >>> > >>> On Thu, Feb 7, 2019 at 9:35 PM Adam Kocoloski <kocol...@apache.org> wrote: > >>>> > >>>> Bob, Garren, Jan - heard you loud and clear, K.I.S.S. I do think it’s a > >>>> bit “simplistic" to exclusively choose simplicity over performance and > >>>> storage density. We’re (re)building a database here, one that has some > >>>> users with pretty demanding performance and scalability requirements. > >>>> And yes, we should certainly be testing and measuring. Kyle and team are > >>>> setting up infrastructure in IBM land to help with that now, but I also > >>>> believe we can design the data model and architecture with a basic > >>>> performance model of FoundationDB in mind: > >>>> > >>>> - reads cost 1ms > >>>> - short range reads are the same cost as a single lookup > >>>> - reads of independent parts of the keyspace can be parallelized for > >>>> cheap > >>>> - writes are zero-cost until commit time > >>>> > >>>> We ought to be able to use these assumptions to drive some decisions > >>>> about data models ahead of any end-to-end performance test. > >>>> > >>>> If there are specific elements of the edit conflicts management where > >>>> you think greater simplicity is warranted, let’s get those called out. > >>>> Ilya noted (correctly, in my opinion) that the term sharing stuff is one > >>>> of those items. It’s relatively complex, potentially a performance hit, > >>>> and only saves on storage density in the corner case of lots of edit > >>>> conflicts. That’s a good one to drop. > >>>> > >>>> I’m relatively happy with the revision history data model at this point. > >>>> Hopefully folks find it easy to grok, and it’s efficient for both reads > >>>> and writes. It costs some extra storage for conflict revisions compared > >>>> to the current tree representation (up to 16K per edit branch, with > >>>> default _revs_limit) but knowing what we know about the performance > >>>> death spiral for wide revision trees today I’ll happily make a storage > >>>> vs. performance tradeoff here :) > >>>> > >>>> Setting the shared term approach aside, I’ve still been mulling over the > >>>> key structure for the actual document data: > >>>> > >>>> - I thought about trying to construct a special _conflicts subspace, > >>>> but I don’t like that approach because the choice of a “winning" > >>>> revision can flip back and forth very quickly with concurrent writers to > >>>> different edit branches. I think we really want to have a way for > >>>> revisions to naturally sort themselves so the winner is the first or > >>>> last revision in a list. > >>>> > >>>> - Assuming we’re using key paths of the form (docid, revision-ish, path, > >>>> to, field), the goal here is to find an efficient way to get the last > >>>> key with prefix “docid” (assuming winner sorts last), and then all the > >>>> keys that share the same (docid, revision-ish) prefix as that one. I see > >>>> two possible approaches so far, neither perfect: > >>>> > >>>> Option 1: Execute a get_key() operation with a key selector that asks > >>>> for the last key less than “docid\xFF” (again assuming winner sorts > >>>> last), and then do a get_range_startswith() request setting the > >>>> streaming mode to “want_all” and the prefix to the docid plus whatever > >>>> revision-ish we found from the get_key() request. This is two roundtrips > >>>> instead of one, but it always retrieves exactly the right set of keys, > >>>> and the second step is executed as fast as possible. > >>>> > >>>> Option 2: Jump straight to get_range_startswith() request using only > >>>> “docid” as the prefix, then cancel the iteration once we reach a > >>>> revision not equal to the first one we see. We might transfer too much > >>>> data, or we might end up doing multiple roundtrips if the default > >>>> “iterator” streaming mode sends too little data to start (I haven’t > >>>> checked what the default iteration block is there), but in the typical > >>>> case of zero edit conflicts we have a good chance of retrieving the full > >>>> document in one roundtrip. > >>>> > >>>> I don’t have a good sense of which option wins out here from a > >>>> performance perspective, but they’re both operating on the same data > >>>> model so easy enough to test the alternatives. The important bit is > >>>> getting the revision-ish things to sort correctly. I think we can do > >>>> that by generating something like > >>>> > >>>> revision-ish = NotDeleted/1bit : RevPos : RevHash > >>>> > >>>> with some suitable order-preserving encoding on the RevPos integer. > >>>> > >>>> Apologies for the long email. Happy for any comments, either here or > >>>> over on IRC. Cheers, > >>>> > >>>> Adam > >>>> > >>>>> On Feb 7, 2019, at 4:52 PM, Robert Newson <rnew...@apache.org> wrote: > >>>>> > >>>>> I think we should choose simple. We can then see if performance is too > >>>>> low or storage overhead too high and then see what we can do about it. > >>>>> > >>>>> B. > >>>>> > >>>>> -- > >>>>> Robert Samuel Newson > >>>>> rnew...@apache.org > >>>>> > >>>>> On Thu, 7 Feb 2019, at 20:36, Ilya Khlopotov wrote: > >>>>>> We cannot do simple thing if we want to support sharing of JSON terms. > >>>>>> I > >>>>>> think if we want the simplest path we should move sharing out of the > >>>>>> scope. The problem with sharing is we need to know the location of > >>>>>> shared terms when we do write. This means that we have to read full > >>>>>> document on every write. There might be tricks to replace full document > >>>>>> read with some sort of hierarchical signature or sketch of a document. > >>>>>> However these tricks do not fall into simplest solution category. We > >>>>>> need to choose the design goals: > >>>>>> - simple > >>>>>> - performance > >>>>>> - reduced storage overhead > >>>>>> > >>>>>> best regards, > >>>>>> iilyak > >>>>>> > >>>>>> On 2019/02/07 12:45:34, Garren Smith <gar...@apache.org> wrote: > >>>>>>> I’m also in favor of keeping it really simple and then testing and > >>>>>>> measuring it. > >>>>>>> > >>>>>>> What is the best way to measure that we have something that works? > >>>>>>> I’m not > >>>>>>> sure just relying on our current tests will prove that? Should we > >>>>>>> define > >>>>>>> and build some more complex situations e.g docs with lots of > >>>>>>> conflicts or > >>>>>>> docs with wide revisions and make sure we can solve for those? > >>>>>>> > >>>>>>> On Thu, Feb 7, 2019 at 12:33 PM Jan Lehnardt <j...@apache.org> wrote: > >>>>>>> > >>>>>>>> I’m also very much in favour with starting with the simplest thing > >>>>>>>> that > >>>>>>>> can possibly work and doesn’t go against the advertised best > >>>>>>>> practices of > >>>>>>>> FoundationDB. Let’s get that going and get a feel for how it all > >>>>>>>> works > >>>>>>>> together, before trying to optimise things we can’t measure yet. > >>>>>>>> > >>>>>>>> Best > >>>>>>>> Jan > >>>>>>>> — > >>>>>>>> > >>>>>>>>> On 6. Feb 2019, at 16:58, Robert Samuel Newson <rnew...@apache.org> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> With the Redwood storage engine under development and with prefix > >>>>>>>> elision part of its design, I don’t think we should get too hung up > >>>>>>>> on > >>>>>>>> adding complications and indirections in the key space just yet. We > >>>>>>>> haven’t > >>>>>>>> written a line of code or run any tests, this is premature > >>>>>>>> optimisation. > >>>>>>>>> > >>>>>>>>> I’d like to focus on the simplest solution that yields all required > >>>>>>>> properties. We can embellish later (if warranted). > >>>>>>>>> > >>>>>>>>> I am intrigued by all the ideas that might allow us cheaper inserts > >>>>>>>>> and > >>>>>>>> updates than the current code where there are multiple edit branches > >>>>>>>> in the > >>>>>>>> stored document. > >>>>>>>>> > >>>>>>>>> B. > >>>>>>>>> > >>>>>>>>>> On 6 Feb 2019, at 02:18, Ilya Khlopotov <iil...@apache.org> wrote: > >>>>>>>>>> > >>>>>>>>>> While reading Adam's proposal I came to realize that: we don't > >>>>>>>>>> have to > >>>>>>>> calculate winning revision at read time. > >>>>>>>>>> Since FDB's transactions are atomic we can calculate it when we > >>>>>>>>>> write. > >>>>>>>> This means we can just write latest values into separate range. This > >>>>>>>> makes > >>>>>>>> lookup of latest version fast. > >>>>>>>>>> Another realization is if we want to share values for some json > >>>>>>>>>> paths > >>>>>>>> we would have to introduce a level of indirection. > >>>>>>>>>> Bellow is the data model inspired by Adam's idea to share > >>>>>>>>>> json_paths. > >>>>>>>> In this model the json_path is stored in the revision where it was > >>>>>>>> first > >>>>>>>> added (we call that revision an owner of a json_path). The values for > >>>>>>>> json_path key can be scalar values, parts of scalar values or > >>>>>>>> pointers to > >>>>>>>> owner location. > >>>>>>>>>> The below snippets are sketches of transactions. > >>>>>>>>>> The transactions will include updates to other keys as needed > >>>>>>>> (`external_size`, `by_seq` and so on). The revision tree management > >>>>>>>> is not > >>>>>>>> covered yet. > >>>>>>>>>> The `rev -> vsn` indirection is not strictly required. It is added > >>>>>>>> because it saves some space since `rev` is a long string and `vsn` > >>>>>>>> is FDB > >>>>>>>> versionstamp of fixed size. > >>>>>>>>>> > >>>>>>>>>> - `{NS} / {docid} / _by_rev / {rev} = vsn` > >>>>>>>>>> - `{NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL` > >>>>>>>>>> - `{NS} / {docid} / _data / {json_path} = latest_value | part` > >>>>>>>>>> - `{NS} / {docid} / {vsn} / _data / {json_path} = value | part | > >>>>>>>> {another_vsn}` > >>>>>>>>>> > >>>>>>>>>> ``` > >>>>>>>>>> write(txn, doc_id, prev_rev, json): > >>>>>>>>>> txn.add_write_conflict_key("{NS} / {doc_id} / _rev") > >>>>>>>>>> rev = generate_new_rev() > >>>>>>>>>> txn["{NS} / {docid} / _by_rev / {rev}"] = vsn > >>>>>>>>>> for every json_path in flattened json > >>>>>>>>>> - {NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL > >>>>>>>>>> if rev is HEAD: > >>>>>>>>>> # this range contains values for all json paths for the latest > >>>>>>>> revision (read optimization) > >>>>>>>>>> - {NS} / {docid} / _data / {json_path} = latest_value | part > >>>>>>>>>> - {NS} / {docid} / {vsn} / _data / {json_path} = value | part | > >>>>>>>> {another_vsn} > >>>>>>>>>> txn["{NS} / {doc_id} / _rev"] = rev > >>>>>>>>>> > >>>>>>>>>> get_current(txn, doc_id): > >>>>>>>>>> # there is no sharing of json_paths in this range (read > >>>>>>>>>> optimization) > >>>>>>>>>> txn.get_range("{NS} / {docid} / _data / 0x00", "{NS} / {docid} / > >>>>>>>>>> _data > >>>>>>>> / 0xFF" ) > >>>>>>>>>> > >>>>>>>>>> get_revision(txn, doc_id, rev): > >>>>>>>>>> vsn = txn["{NS} / {docid} / _by_rev / {rev}"] > >>>>>>>>>> json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00", > >>>>>>>> "{NS} / {vsn} / {docid} / _data / 0xFF" ) > >>>>>>>>>> for every json_path in json_paths: > >>>>>>>>>> if value has type vsn: > >>>>>>>>>> another_vsn = value > >>>>>>>>>> value = txn["{NS} / {docid} / {another_vsn} / _data / > >>>>>>>> {json_path}"] > >>>>>>>>>> result[json_path] = value > >>>>>>>>>> > >>>>>>>>>> delete_revision(txn, doc_id, rev): > >>>>>>>>>> vsn = txn["{NS} / {docid} / _by_rev / {rev}"] > >>>>>>>>>> json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00", > >>>>>>>> "{NS} / {vsn} / {docid} / _data / 0xFF" ) > >>>>>>>>>> for every json_path in json_paths: > >>>>>>>>>> if value has type vsn: > >>>>>>>>>> # remove reference to deleted revision from the owner > >>>>>>>>>> del txn[{NS} / {docid} / _used_by / {json_path} / {vsn}] > >>>>>>>>>> # check if deleted revision of json_path is not used by anything > >>>>>>>>>> else > >>>>>>>>>> if txn.get_range("{NS} / {docid} / _used_by / {json_path} / {vsn}", > >>>>>>>> limit=1) == []: > >>>>>>>>>> del txn["{NS} / {docid} / {vsn} / _data / {json_path}"] > >>>>>>>>>> if vsn is HEAD: > >>>>>>>>>> copy range for winning revision into "{NS} / {docid} / _data / > >>>>>>>> {json_path}" > >>>>>>>>>> ``` > >>>>>>>>>> > >>>>>>>>>> best regards, > >>>>>>>>>> iilyak > >>>>>>>>>> > >>>>>>>>>> On 2019/02/04 23:22:09, Adam Kocoloski <kocol...@apache.org> wrote: > >>>>>>>>>>> I think it’s fine to start a focused discussion here as it might > >>>>>>>>>>> help > >>>>>>>> inform some of the broader debate over in that thread. > >>>>>>>>>>> > >>>>>>>>>>> As a reminder, today CouchDB writes the entire body of each > >>>>>>>>>>> document > >>>>>>>> revision on disk as a separate blob. Edit conflicts that have common > >>>>>>>> fields > >>>>>>>> between them do not share any storage on disk. The revision tree is > >>>>>>>> encoded > >>>>>>>> into a compact format and a copy of it is stored directly in both > >>>>>>>> the by_id > >>>>>>>> tree and the by_seq tree. Each leaf entry in the revision tree > >>>>>>>> contain a > >>>>>>>> pointer to the position of the associated doc revision on disk. > >>>>>>>>>>> > >>>>>>>>>>> As a further reminder, CouchDB 2.x clusters can generate edit > >>>>>>>>>>> conflict > >>>>>>>> revisions just from multiple clients concurrently updating the same > >>>>>>>> document in a single cluster. This won’t happen when FoundationDB is > >>>>>>>> running under the hood, but users who deploy multiple CouchDB or > >>>>>>>> PouchDB > >>>>>>>> servers and replicate between them can of course still produce > >>>>>>>> conflicts > >>>>>>>> just like they could in CouchDB 1.x, so we need a solution. > >>>>>>>>>>> > >>>>>>>>>>> Let’s consider the two sub-topics separately: 1) storage of edit > >>>>>>>> conflict bodies and 2) revision trees > >>>>>>>>>>> > >>>>>>>>>>> ## Edit Conflict Storage > >>>>>>>>>>> > >>>>>>>>>>> The simplest possible solution would be to store each document > >>>>>>>> revision separately, like we do today. We could store document > >>>>>>>> bodies with > >>>>>>>> (“docid”, “revid”) as the key prefix, and each transaction could > >>>>>>>> clear the > >>>>>>>> key range associated with the base revision against which the edit > >>>>>>>> is being > >>>>>>>> attempted. This would work, but I think we can try to be a bit more > >>>>>>>> clever > >>>>>>>> and save on storage space given that we’re splitting JSON documents > >>>>>>>> into > >>>>>>>> multiple KV pairs. > >>>>>>>>>>> > >>>>>>>>>>> One thought I’d had is to introduce a special enum Value which > >>>>>>>> indicates that the subtree “beneath” the given Key is in conflict. > >>>>>>>> For > >>>>>>>> example, consider the documents > >>>>>>>>>>> > >>>>>>>>>>> { > >>>>>>>>>>> “_id”: “foo”, > >>>>>>>>>>> “_rev”: “1-abc”, > >>>>>>>>>>> “owner”: “alice”, > >>>>>>>>>>> “active”: true > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> and > >>>>>>>>>>> > >>>>>>>>>>> { > >>>>>>>>>>> “_id”: “foo”, > >>>>>>>>>>> “_rev”: “1-def”, > >>>>>>>>>>> “owner”: “bob”, > >>>>>>>>>>> “active”: true > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> We could represent these using the following set of KVs: > >>>>>>>>>>> > >>>>>>>>>>> (“foo”, “active”) = true > >>>>>>>>>>> (“foo”, “owner”) = kCONFLICT > >>>>>>>>>>> (“foo”, “owner”, “1-abc”) = “alice” > >>>>>>>>>>> (“foo”, “owner”, “1-def”) = “bob” > >>>>>>>>>>> > >>>>>>>>>>> This approach also extends to conflicts where the two versions > >>>>>>>>>>> have > >>>>>>>> different data types. Consider a more complicated example where bob > >>>>>>>> dropped > >>>>>>>> the “active” field and changed the “owner” field to an object: > >>>>>>>>>>> > >>>>>>>>>>> { > >>>>>>>>>>> “_id”: “foo”, > >>>>>>>>>>> “_rev”: “1-def”, > >>>>>>>>>>> “owner”: { > >>>>>>>>>>> “name”: “bob”, > >>>>>>>>>>> “email”: “b...@example.com" > >>>>>>>>>>> } > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> Now the set of KVs for “foo” looks like this (note that a missing > >>>>>>>> field needs to be handled explicitly): > >>>>>>>>>>> > >>>>>>>>>>> (“foo”, “active”) = kCONFLICT > >>>>>>>>>>> (“foo”, “active”, “1-abc”) = true > >>>>>>>>>>> (“foo”, “active”, “1-def”) = kMISSING > >>>>>>>>>>> (“foo”, “owner”) = kCONFLICT > >>>>>>>>>>> (“foo”, “owner”, “1-abc”) = “alice” > >>>>>>>>>>> (“foo”, “owner”, “1-def”, “name”) = “bob” > >>>>>>>>>>> (“foo”, “owner”, “1-def”, “email”) = “b...@example.com” > >>>>>>>>>>> > >>>>>>>>>>> I like this approach for the common case where documents share > >>>>>>>>>>> most of > >>>>>>>> their data in common but have a conflict in a very specific field or > >>>>>>>> set of > >>>>>>>> fields. > >>>>>>>>>>> > >>>>>>>>>>> I’ve encountered one important downside, though: an edit that > >>>>>>>> replicates in and conflicts with the entire document can cause a bit > >>>>>>>> of a > >>>>>>>> data explosion. Consider a case where I have 10 conflicting versions > >>>>>>>> of a > >>>>>>>> 100KB document, but the conflicts are all related to a single scalar > >>>>>>>> value. > >>>>>>>> Now I replicate in an empty document, and suddenly I have a > >>>>>>>> kCONFLICT at > >>>>>>>> the root. In this model I now need to list out every path of every > >>>>>>>> one of > >>>>>>>> the 10 existing revisions and I end up with a 1MB update. Yuck. > >>>>>>>> That’s > >>>>>>>> technically no worse in the end state than the “zero sharing” case > >>>>>>>> above, > >>>>>>>> but one could easily imagine overrunning the transaction size limit > >>>>>>>> this > >>>>>>>> way. > >>>>>>>>>>> > >>>>>>>>>>> I suspect there’s a smart path out of this. Maybe the system > >>>>>>>>>>> detects a > >>>>>>>> “default” value for each field and uses that instead of writing out > >>>>>>>> the > >>>>>>>> value for every revision in a conflicted subtree. Worth some > >>>>>>>> discussion. > >>>>>>>>>>> > >>>>>>>>>>> ## Revision Trees > >>>>>>>>>>> > >>>>>>>>>>> In CouchDB we currently represent revisions as a hash history > >>>>>>>>>>> tree; > >>>>>>>> each revision identifier is derived from the content of the revision > >>>>>>>> including the revision identifier of its parent. Individual edit > >>>>>>>> branches > >>>>>>>> are bounded in *length* (I believe the default is 1000 entries), but > >>>>>>>> the > >>>>>>>> number of edit branches is technically unbounded. > >>>>>>>>>>> > >>>>>>>>>>> The size limits in FoundationDB preclude us from storing the > >>>>>>>>>>> entire > >>>>>>>> key tree as a single value; in pathological situations the tree could > >>>>>>>> exceed 100KB. Rather, I think it would make sense to store each edit > >>>>>>>> *branch* as a separate KV. We stem the branch long before it hits > >>>>>>>> the value > >>>>>>>> size limit, and in the happy case of no edit conflicts this means we > >>>>>>>> store > >>>>>>>> the edit history metadata in a single KV. It also means that we can > >>>>>>>> apply > >>>>>>>> an interactive edit without retrieving the entire conflicted > >>>>>>>> revision tree; > >>>>>>>> we need only retrieve and modify the single branch against which the > >>>>>>>> edit > >>>>>>>> is being applied. The downside is that we duplicate historical > >>>>>>>> revision > >>>>>>>> identifiers shared by multiple edit branches, but I think this is a > >>>>>>>> worthwhile tradeoff. > >>>>>>>>>>> > >>>>>>>>>>> I would furthermore try to structure the keys so that it is > >>>>>>>>>>> possible > >>>>>>>> to retrieve the “winning” revision in a single limit=1 range query. > >>>>>>>> Ideally > >>>>>>>> I’d like to proide the following properties: > >>>>>>>>>>> > >>>>>>>>>>> 1) a document read does not need to retrieve the revision tree at > >>>>>>>>>>> all, > >>>>>>>> just the winning revision identifier (which would be stored with the > >>>>>>>> rest > >>>>>>>> of the doc) > >>>>>>>>>>> 2) a document update only needs to read the edit branch of the > >>>>>>>> revision tree against which the update is being applied, and it can > >>>>>>>> read > >>>>>>>> that branch immediately knowing only the content of the edit that is > >>>>>>>> being > >>>>>>>> attempted (i.e., it does not need to read the current version of the > >>>>>>>> document itself). > >>>>>>>>>>> > >>>>>>>>>>> So, I’d propose a separate subspace (maybe “_meta”?) for the > >>>>>>>>>>> revision > >>>>>>>> trees, with keys and values that look like > >>>>>>>>>>> > >>>>>>>>>>> (“_meta”, DocID, IsDeleted, RevPosition, RevHash) = [ParentRev, > >>>>>>>> GrandparentRev, …] > >>>>>>>>>>> > >>>>>>>>>>> The inclusion of IsDeleted, RevPosition and RevHash in the key > >>>>>>>>>>> should > >>>>>>>> be sufficient (with the right encoding) to create a range query that > >>>>>>>> automatically selects the “winner” according to CouchDB’s arcane > >>>>>>>> rules, > >>>>>>>> which are something like > >>>>>>>>>>> > >>>>>>>>>>> 1) deleted=false beats deleted=true > >>>>>>>>>>> 2) longer paths (i.e. higher RevPosition) beat shorter ones > >>>>>>>>>>> 3) RevHashes with larger binary values beat ones with smaller > >>>>>>>>>>> values > >>>>>>>>>>> > >>>>>>>>>>> =========== > >>>>>>>>>>> > >>>>>>>>>>> OK, that’s all on this topic from me for now. I think this is a > >>>>>>>> particularly exciting area where we start to see the dividends of > >>>>>>>> splitting > >>>>>>>> up data into multiple KV pairs in FoundationDB :) Cheers, > >>>>>>>>>>> > >>>>>>>>>>> Adam > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> On Feb 4, 2019, at 2:41 PM, Robert Newson <rnew...@apache.org> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> This one is quite tightly coupled to the other thread on data > >>>>>>>>>>>> model, > >>>>>>>> should we start much conversation here before that one gets closer > >>>>>>>> to a > >>>>>>>> solution? > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> Robert Samuel Newson > >>>>>>>>>>>> rnew...@apache.org > >>>>>>>>>>>> > >>>>>>>>>>>> On Mon, 4 Feb 2019, at 19:25, Ilya Khlopotov wrote: > >>>>>>>>>>>>> This is a beginning of a discussion thread about storage of edit > >>>>>>>>>>>>> conflicts and everything which relates to revisions. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Professional Support for Apache CouchDB: > >>>>>>>> https://neighbourhood.ie/couchdb-support/ > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >> >