Thanks for the additional context. Given that our JS-keys are already when it comes to floats/precision vs. Erlang numbers, I’m not concerned :D
> On 27. Jul 2020, at 16:35, Paul Davis <paul.joseph.da...@gmail.com> wrote: > > In terms of collation there will be differences in some edge cases. > Numbers in particular are all converted to floats which means that > extreme values that don't fit into a float (though Erlang floats are > doubles) will have a truncation that reduces resolution. Other than > that I'm pretty sure it'd be the same. There might be some issues > around the ICU library used for sort keys although both > implementations would suffer from issues there, but I think it'd be > slightly different issues. > > Paul > > On Mon, Jul 27, 2020 at 2:28 AM Garren Smith <gar...@apache.org> wrote: >> >> Hi Bob, >> >> That makes sense. >> >> Cheers >> Garren >> >> On Sun, Jul 26, 2020 at 2:57 PM Robert Newson <rnew...@apache.org> wrote: >> >>> >>> Both approaches will collate keys the same way, it’s a showstopping bug if >>> they don’t match existing couch collation order. >>> >>> I’d probably not even document that right now. We can document what aegis >>> does and doesn’t cover when we make the first 4.0 RC. >>> -- >>> Robert Samuel Newson >>> rnew...@apache.org >>> >>> On Sun, 26 Jul 2020, at 13:27, Jan Lehnardt wrote: >>>> Heya Bob, >>>> >>>> this sounds like a good path forward that allows us to validate the >>>> remaining pieces of this. >>>> >>>> Just two side notes: >>>> >>>> - do we expect that the map-keys-in-fdb approach would sort differently >>>> from map-keys-in-ebtree? (I assume no) >>>> - I further assume that we’ll find a nicer UX for “make sure my views >>>> are encrypted” than “make sure your view has a reduce fun”, no details >>>> or drafts necessary at this point. >>>> >>>> Best >>>> Jan >>>> — >>>> >>>>> On 26. Jul 2020, at 14:13, Robert Samuel Newson <rnew...@apache.org> >>> wrote: >>>>> >>>>> I'd like to find consensus on a path forward and, with that in mind, I >>> have a suggestion. >>>>> >>>>> Before that, I think it's useful to lay out the points of contention, >>> which I've ranked in I hope an uncontroversial order; >>>>> >>>>> The principal concern with using ebtree for map as well as reduce is >>> that it precludes the use of foundationdb's ordering, on which much >>> performance and scalability likely hinges. I agree with this on principle >>> (that is, its demonstrably true) though none of us yet know the extent to >>> which it matters in practice, or whether any gap could be closed with >>> optimization. >>>>> >>>>> The secondary concern is the proposed use of ebtree for reduce-only >>> where it receives only the reduce values from a document. I understand >>> garren's stated desire to avoid duplication with the map side but I don't >>> think it works out. Users of the reduce function are using it to get the >>> reductions _across_ documents. It is rare (that is, I don't recall seeing >>> an example by a cloudant customer) to emit multiple rows in a map function >>> for the same key. In the cases where it doesn't, the value inserted into >>> ebtree is the same as the map value anyway. Where it does happen, we're >>> still duplicating the key and _a_ document level value. It seems a very >>> modest saving of space for a more complicated design (two separate levels >>> of reduce calculated in two separate modules). >>>>> >>>>> The tertiary concern is that the map-only indexes place user data >>> outside of aegis (native database encryption) as, necessarily, the order of >>> keys is important and a straightforward encryption there would not preserve >>> it. >>>>> >>>>> Looking at all the options available I don't think there is a single >>> path that addresses everything. There are also significant unknowns around >>> practical performance for all of this. >>>>> >>>>> So, I suggest that the map-only code (specifically, the code we will >>> use for an index definition that does not specify a reduce function) uses >>> the existing couch_views approach, where the fdb keys are suffixed with the >>> emitted key from the map function, and the fdb values are the emitted >>> values. This vertical index should support the fastest ?key= and >>> startkey/endkey= queries that fdb is capable of. There might be >>> simplifications we can make to this code if we accept that it only needs to >>> handle map-only indexes. >>>>> >>>>> For index definitions that _do_ specify a reduce function, I suggest >>> this is done entirely with ebtree.We would call ebtree:insert(Db, Tree, >>> Key, Value) where Key and Value are the emitted key and value from the map >>> function. ebtree:open is handed a reduce_fun which handles the reduce and >>> rereduce for the users reduction and, like couchdb 1/2/3, we also mix in a >>> system reduction that calculates the view size (paul's diff above appears >>> to do just this portion and looks broadly right). >>>>> >>>>> This avoids the duplication that garren's idea of storing the reduces >>> in ebtree was trying to avoid, but does so in all cases. This approach >>> allows us to compare the performance of map queries against map-only and >>> map-reduce indexes, it allows users to opt in or out of encrypting their >>> emitted view keys, and it doesn't prevent anyone from choosing to do both >>> (You can define a view with map function A and another view with map >>> function A and reduce function B, the former will be the couch_views >>> vertical index, the latter an ebtree "horizontal" index). >>>>> >>>>> As we learn more over time we could enhance one type of index or the >>> other and reach a point where we eliminate one, or the two could coexist >>> indefinitely. >>>>> >>>>>> On 24 Jul 2020, at 20:00, Paul Davis <paul.joseph.da...@gmail.com> >>> wrote: >>>>>> >>>>>> FWIW, a first pass at views entirely on ebtree turned out to be fairly >>>>>> straightforward. Almost surprisingly simple in some cases. >>>>>> >>>>>> >>> https://github.com/apache/couchdb/compare/prototype/fdb-layer...prototype/fdb-layer-ebtree-views >>>>>> >>>>>> Its currently passing all tests in `couch_views_map_test.erl` but is >>>>>> failing on other suites. I only did a quick skim on the failures but >>>>>> they all look superficial around some APIs I changed. >>>>>> >>>>>> I haven't added the APIs to query reduce functions via HTTP but the >>>>>> reduce functions are being executed to calculate row counts and KV >>>>>> sizes. Adding the builtin reduce functions and extending those to user >>>>>> defined reduce functions should be straightforward. >>>>>> >>>>>> On Fri, Jul 24, 2020 at 9:39 AM Robert Newson <rnew...@apache.org> >>> wrote: >>>>>>> >>>>>>> >>>>>>> I’m happy to restrict my PR comments to the actual diff, yes. So I’m >>> not +1 yet. >>>>>>> >>>>>>> I fixed the spurious conflicts at >>> https://github.com/apache/couchdb/pull/3033. >>>>>>> >>>>>>> -- >>>>>>> Robert Samuel Newson >>>>>>> rnew...@apache.org >>>>>>> >>>>>>> On Fri, 24 Jul 2020, at 14:59, Garren Smith wrote: >>>>>>>> Ok so just to confirm, we keep my PR as-is with ebtree only for >>> reduce. We >>>>>>>> can get that ready to merge into fdb master. We can then use that >>> to battle >>>>>>>> test ebtree and then look at using it for the map side as well. At >>> that >>>>>>>> point we would combine the reduce and map index into a ebtree >>> index. Are >>>>>>>> you happy with that? >>>>>>>> >>>>>>>> Cheers >>>>>>>> Garren >>>>>>>> >>>>>>>> On Fri, Jul 24, 2020 at 3:48 PM Robert Newson <rnew...@apache.org> >>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> It’s not as unknown as you think but certainly we need empirical >>> data to >>>>>>>>> guide us on the reduce side. I’m also fine with continuing with the >>>>>>>>> map-only code as it stands today until such time as we demonstrate >>> ebtree >>>>>>>>> meets or exceeds our needs (and I freely accept the possibility >>> that it >>>>>>>>> might not). >>>>>>>>> >>>>>>>>> I think the principal enhancement to ebtree that would address most >>>>>>>>> concerns is if it could store the leaf entries vertically (as they >>> are >>>>>>>>> currently). I have some thoughts which I’ll try to realise as >>> working code. >>>>>>>>> >>>>>>>>> I’ve confirmed that I do create spurious conflicts and will have a >>> PR up >>>>>>>>> today to fix that. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Robert Samuel Newson >>>>>>>>> rnew...@apache.org >>>>>>>>> >>>>>>>>> On Fri, 24 Jul 2020, at 12:43, Garren Smith wrote: >>>>>>>>>> Hi Bob, >>>>>>>>>> >>>>>>>>>> Thanks for that explanation, that is really helpful and it is >>> good we >>>>>>>>> have >>>>>>>>>> some options but it also does highlight a lot of unknowns. >>> Whereas our >>>>>>>>>> current map indexes are really simple and we know its behaviour. >>> There >>>>>>>>> are >>>>>>>>>> no real unknowns. Adding ebtree here could make map indexes a >>> fair bit >>>>>>>>> more >>>>>>>>>> complicated since we don't know the effect of managing different >>> node >>>>>>>>>> sizes, concurrent doc updates, and querying performance. >>>>>>>>>> >>>>>>>>>> Could we do a compromise here, could we look at using ebtree only >>> for >>>>>>>>>> reduces now? That means that we can have reduce indexes working >>> quite >>>>>>>>> soon >>>>>>>>>> on CouchDB on FDB. At the same time, we can work on ebtree and run >>>>>>>>>> performance tests on ebtree for map indexes. Some interesting >>> tests we >>>>>>>>> can >>>>>>>>>> do is see if a user emits KV's near the limits (8KB for keys and >>> 50KB for >>>>>>>>>> values) how does ebtree handle that? How it handles being updated >>> in the >>>>>>>>>> doc update transaction. And general query performance. What do >>> you think? >>>>>>>>>> >>>>>>>>>> Cheers >>>>>>>>>> Garren >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jul 24, 2020 at 10:06 AM Robert Samuel Newson < >>>>>>>>> rnew...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> A short preface at this stage is needed I think: >>>>>>>>>>> >>>>>>>>>>> My goal with ebtree was to implement a complete and correct >>> b+tree that >>>>>>>>>>> also calculated and maintained inner reductions. I consciously >>> chose >>>>>>>>> not to >>>>>>>>>>> go further than that before presenting it to the group for wider >>> debate >>>>>>>>>>> (indeed, the very debate we're having in this thread). >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> Ebtree is not single writer, at least not inherently. Two >>> updates to >>>>>>>>> the >>>>>>>>>>> same ebtree should both succeed as long as they don't modify the >>> same >>>>>>>>> nodes. >>>>>>>>>>> >>>>>>>>>>> Modifying the same node is likely where there's a reduce >>> function, >>>>>>>>> though >>>>>>>>>>> only if the reduction value actually changes, as that percolates >>> up the >>>>>>>>>>> tree. Where the reduce value does not change and when neither >>>>>>>>> transaction >>>>>>>>>>> causes a split, rebalance, or merge that affects the other, they >>> should >>>>>>>>>>> both commit without conflict. There is a rich history of >>> optimizations >>>>>>>>> in >>>>>>>>>>> this space specifically around btrees. ebtree might cause >>> spurious >>>>>>>>>>> conflicts today, I will investigate and propose fixes if so >>> (e.g, I >>>>>>>>> think I >>>>>>>>>>> probably call erlfdb:set on nodes that have not changed). >>>>>>>>>>> >>>>>>>>>>> I certainly envisaged updating ebtree within the same txn as a >>> doc >>>>>>>>> update, >>>>>>>>>>> which is why the first argument to all the public functions can >>> be >>>>>>>>> either >>>>>>>>>>> an erlfdb Db or open Tx. >>>>>>>>>>> >>>>>>>>>>> Parallelising the initial build is more difficult with ebtree >>> than >>>>>>>>>>> parallelising the existing couch_views map-only code, though >>> it's worth >>>>>>>>>>> noting that ebtree:insert benefits a great deal from batching >>> (multiple >>>>>>>>>>> calls to :insert in the same transaction). For an offline build >>> (i.e, >>>>>>>>> an >>>>>>>>>>> index that the client cannot see until the entire build is >>> complete) >>>>>>>>> the >>>>>>>>>>> batch size can be maximised. That is still a serial process in >>> that >>>>>>>>> there >>>>>>>>>>> is only one transaction at a time updating the ebtree. I can't >>> say >>>>>>>>> offhand >>>>>>>>>>> how fast that is in practice, but it is clearly less powerful >>> than a >>>>>>>>> fully >>>>>>>>>>> parallelisable approach could be. >>>>>>>>>>> >>>>>>>>>>> Any parallel build would require a way to divide the database >>> into >>>>>>>>>>> non-overlapping subsets of emitted keys. This is easy and >>> natural if >>>>>>>>> the >>>>>>>>>>> fdb key is the emitted key, which is the case for the couch_views >>>>>>>>> map-only >>>>>>>>>>> code. For ebtree it might be enough to simply grab a large chunk >>> of >>>>>>>>>>> documents, perform the map transform, and then issues multiple >>>>>>>>> transactions >>>>>>>>>>> on subsets of those. >>>>>>>>>>> >>>>>>>>>>> Another common technique for btrees is bulk loading (more or less >>>>>>>>>>> literally constructing the btree nodes directly from the source, >>> as >>>>>>>>> long as >>>>>>>>>>> you can sort it), which might be an option as well. >>>>>>>>>>> >>>>>>>>>>> Parallelising a build _with_ a reduce function seems hard >>> however we do >>>>>>>>>>> it. The non-ebtree approach is parallelisable by virtue of >>> paring down >>>>>>>>> the >>>>>>>>>>> reduce functionality itself (only whole key groups, only those >>>>>>>>> functions >>>>>>>>>>> that fdb has atomic operations for). >>>>>>>>>>> >>>>>>>>>>> I will first of all verify the multi-writer nature of ebtree as >>> it >>>>>>>>> stands >>>>>>>>>>> today and make a PR which fixes any spurious conflicts, and then >>> ponder >>>>>>>>>>> further how true parallel builds might be possible. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 24 Jul 2020, at 07:30, Garren Smith <gar...@apache.org> >>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> We haven't spoken much about updates with ebtree. >>>>>>>>>>>> From my understanding of ebtree it can only do single writer, >>> is that >>>>>>>>>>>> correct?. If that is true it means we would not be able to >>>>>>>>> parallelize >>>>>>>>>>> the >>>>>>>>>>>> background building of views to speed up view builds. >>>>>>>>>>>> The other thing that would mean is that we cannot use it for >>> mango >>>>>>>>> where >>>>>>>>>>> we >>>>>>>>>>>> update the view in the doc transaction. >>>>>>>>>>>> >>>>>>>>>>>> Cheers >>>>>>>>>>>> Garren >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jul 24, 2020 at 2:35 AM Kyle Snavely < >>> kjsnav...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> When in doubt, throw us a build at Cloudant with ebtree maps >>> and >>>>>>>>> we'll >>>>>>>>>>> see >>>>>>>>>>>>> if it comes close to the crazy fast KV map queries. >>>>>>>>>>>>> >>>>>>>>>>>>> Kyle >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jul 23, 2020, 2:17 PM Robert Samuel Newson < >>>>>>>>> rnew...@apache.org> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I (perhaps obviously) don't agree that I'm tying myself to old >>>>>>>>> CouchDB >>>>>>>>>>> or >>>>>>>>>>>>>> failing to embrace FDB. FDB is a toolkit and does not, to my >>> mind, >>>>>>>>>>> force >>>>>>>>>>>>> us >>>>>>>>>>>>>> down any particular path. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I haven't sat down to modify couch_views in the manner I've >>>>>>>>> suggested >>>>>>>>>>>>>> (where ebtree is used as designed; being fed the output of the >>>>>>>>> emit() >>>>>>>>>>>>> calls >>>>>>>>>>>>>> and calculating reductions as it does so) but I think it's a >>>>>>>>> worthwhile >>>>>>>>>>>>>> exercise. I'd be surprised if performance of map-only >>> traversals >>>>>>>>> would >>>>>>>>>>> be >>>>>>>>>>>>>> disappointing but who knows? I also expect it would allow for >>>>>>>>>>> significant >>>>>>>>>>>>>> simplification of the code, which is one of the higher >>> virtues. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Adam, can you describe in a little more detail how you picture >>>>>>>>> "b+tree >>>>>>>>>>> is >>>>>>>>>>>>>> only used for incremental aggregations,"? It's implied in your >>>>>>>>> reply >>>>>>>>>>> that >>>>>>>>>>>>>> it would preserve the "interesting property" of keeping user >>> data >>>>>>>>> out >>>>>>>>>>> of >>>>>>>>>>>>>> FDB Keys (for casual readers: the new native database >>> encryption, >>>>>>>>>>> called >>>>>>>>>>>>>> "aegis", only encrypts the FDB value. It can't encrypt the >>> key as >>>>>>>>> this >>>>>>>>>>>>>> would change the order of keys, which the current code >>> depends on). >>>>>>>>>>> Did I >>>>>>>>>>>>>> misread you? >>>>>>>>>>>>>> >>>>>>>>>>>>>> B. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 23 Jul 2020, at 20:11, Adam Kocoloski < >>> kocol...@apache.org> >>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> OK thanks for the clarification. As I said I wasn’t all that >>>>>>>>> confident >>>>>>>>>>>>> I >>>>>>>>>>>>>> understood the design :) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I like the idea that the b+tree is only used for incremental >>>>>>>>>>>>>> aggregations, rather than storing the entire materialized >>> view, >>>>>>>>> for the >>>>>>>>>>>>>> same reasons that Garren stated. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> An index maintained entirely in ebtree has the interesting >>>>>>>>> property >>>>>>>>>>>>> that >>>>>>>>>>>>>> it does not leak any user data into FDB Keys, which could be >>>>>>>>> attractive >>>>>>>>>>>>> for >>>>>>>>>>>>>> security reasons. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Adam >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jul 23, 2020, at 1:54 PM, Garren Smith < >>> gar...@apache.org> >>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Jul 23, 2020 at 6:55 PM Paul Davis < >>>>>>>>>>>>> paul.joseph.da...@gmail.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I would like to keep ebtree to use just for the reduce >>> index. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Could you expand on your reasoning here, Garren? I haven't >>> done >>>>>>>>> any >>>>>>>>>>>>>>>>> experiments on my own to understand if I'm missing >>> something >>>>>>>>>>>>>>>>> important. My initial reaction is to not diverge too far >>> from >>>>>>>>> the >>>>>>>>>>>>>>>>> previous shape of the implementation since we have a decent >>>>>>>>> idea of >>>>>>>>>>>>>>>>> how that behaves already but perhaps you've seen or >>> measured >>>>>>>>>>>>> something >>>>>>>>>>>>>>>>> I'm not thinking of? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think this must have been a misunderstanding on my part. I >>>>>>>>> always >>>>>>>>>>>>>> thought >>>>>>>>>>>>>>>> of using ebtree to solve reduce and wasn't planning to use >>> it >>>>>>>>> for the >>>>>>>>>>>>>> map >>>>>>>>>>>>>>>> index. >>>>>>>>>>>>>>>> I don't like the idea that we have ordered distributed >>> key/value >>>>>>>>>>> store >>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> then implementing a b-tree on top of that for indexing. >>>>>>>>> Especially >>>>>>>>>>>>>> since we >>>>>>>>>>>>>>>> know that the current map index is fast, >>>>>>>>>>>>>>>> easy to use, and follows recommended practices in the FDB >>>>>>>>> community >>>>>>>>>>> on >>>>>>>>>>>>>> how >>>>>>>>>>>>>>>> to do indexing. ebtree makes sense for reduce and is a >>> means to >>>>>>>>> an >>>>>>>>>>> end >>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> give us CouchDB's reduce api, which is heavily reliant on a >>>>>>>>> b-tree, >>>>>>>>>>>>> with >>>>>>>>>>>>>>>> CouchDB on FDB. This feels like a step backward and I worry >>> we >>>>>>>>> are >>>>>>>>>>>>> tying >>>>>>>>>>>>>>>> ourselves heavily to old CouchDB instead of using the fact >>> that >>>>>>>>> we >>>>>>>>>>>>>> moving >>>>>>>>>>>>>>>> to FDB which then allows us to design new api's and >>>>>>>>> functionality. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> >>>> >>>> >>>