I’m happy to restrict my PR comments to the actual diff, yes. So I’m not +1 yet.
I fixed the spurious conflicts at https://github.com/apache/couchdb/pull/3033. -- Robert Samuel Newson rnew...@apache.org On Fri, 24 Jul 2020, at 14:59, Garren Smith wrote: > Ok so just to confirm, we keep my PR as-is with ebtree only for reduce. We > can get that ready to merge into fdb master. We can then use that to battle > test ebtree and then look at using it for the map side as well. At that > point we would combine the reduce and map index into a ebtree index. Are > you happy with that? > > Cheers > Garren > > On Fri, Jul 24, 2020 at 3:48 PM Robert Newson <rnew...@apache.org> wrote: > > > Hi, > > > > It’s not as unknown as you think but certainly we need empirical data to > > guide us on the reduce side. I’m also fine with continuing with the > > map-only code as it stands today until such time as we demonstrate ebtree > > meets or exceeds our needs (and I freely accept the possibility that it > > might not). > > > > I think the principal enhancement to ebtree that would address most > > concerns is if it could store the leaf entries vertically (as they are > > currently). I have some thoughts which I’ll try to realise as working code. > > > > I’ve confirmed that I do create spurious conflicts and will have a PR up > > today to fix that. > > > > -- > > Robert Samuel Newson > > rnew...@apache.org > > > > On Fri, 24 Jul 2020, at 12:43, Garren Smith wrote: > > > Hi Bob, > > > > > > Thanks for that explanation, that is really helpful and it is good we > > have > > > some options but it also does highlight a lot of unknowns. Whereas our > > > current map indexes are really simple and we know its behaviour. There > > are > > > no real unknowns. Adding ebtree here could make map indexes a fair bit > > more > > > complicated since we don't know the effect of managing different node > > > sizes, concurrent doc updates, and querying performance. > > > > > > Could we do a compromise here, could we look at using ebtree only for > > > reduces now? That means that we can have reduce indexes working quite > > soon > > > on CouchDB on FDB. At the same time, we can work on ebtree and run > > > performance tests on ebtree for map indexes. Some interesting tests we > > can > > > do is see if a user emits KV's near the limits (8KB for keys and 50KB for > > > values) how does ebtree handle that? How it handles being updated in the > > > doc update transaction. And general query performance. What do you think? > > > > > > Cheers > > > Garren > > > > > > > > > On Fri, Jul 24, 2020 at 10:06 AM Robert Samuel Newson < > > rnew...@apache.org> > > > wrote: > > > > > > > Hi, > > > > > > > > A short preface at this stage is needed I think: > > > > > > > > My goal with ebtree was to implement a complete and correct b+tree that > > > > also calculated and maintained inner reductions. I consciously chose > > not to > > > > go further than that before presenting it to the group for wider debate > > > > (indeed, the very debate we're having in this thread). > > > > > > > > -- > > > > > > > > Ebtree is not single writer, at least not inherently. Two updates to > > the > > > > same ebtree should both succeed as long as they don't modify the same > > nodes. > > > > > > > > Modifying the same node is likely where there's a reduce function, > > though > > > > only if the reduction value actually changes, as that percolates up the > > > > tree. Where the reduce value does not change and when neither > > transaction > > > > causes a split, rebalance, or merge that affects the other, they should > > > > both commit without conflict. There is a rich history of optimizations > > in > > > > this space specifically around btrees. ebtree might cause spurious > > > > conflicts today, I will investigate and propose fixes if so (e.g, I > > think I > > > > probably call erlfdb:set on nodes that have not changed). > > > > > > > > I certainly envisaged updating ebtree within the same txn as a doc > > update, > > > > which is why the first argument to all the public functions can be > > either > > > > an erlfdb Db or open Tx. > > > > > > > > Parallelising the initial build is more difficult with ebtree than > > > > parallelising the existing couch_views map-only code, though it's worth > > > > noting that ebtree:insert benefits a great deal from batching (multiple > > > > calls to :insert in the same transaction). For an offline build (i.e, > > an > > > > index that the client cannot see until the entire build is complete) > > the > > > > batch size can be maximised. That is still a serial process in that > > there > > > > is only one transaction at a time updating the ebtree. I can't say > > offhand > > > > how fast that is in practice, but it is clearly less powerful than a > > fully > > > > parallelisable approach could be. > > > > > > > > Any parallel build would require a way to divide the database into > > > > non-overlapping subsets of emitted keys. This is easy and natural if > > the > > > > fdb key is the emitted key, which is the case for the couch_views > > map-only > > > > code. For ebtree it might be enough to simply grab a large chunk of > > > > documents, perform the map transform, and then issues multiple > > transactions > > > > on subsets of those. > > > > > > > > Another common technique for btrees is bulk loading (more or less > > > > literally constructing the btree nodes directly from the source, as > > long as > > > > you can sort it), which might be an option as well. > > > > > > > > Parallelising a build _with_ a reduce function seems hard however we do > > > > it. The non-ebtree approach is parallelisable by virtue of paring down > > the > > > > reduce functionality itself (only whole key groups, only those > > functions > > > > that fdb has atomic operations for). > > > > > > > > I will first of all verify the multi-writer nature of ebtree as it > > stands > > > > today and make a PR which fixes any spurious conflicts, and then ponder > > > > further how true parallel builds might be possible. > > > > > > > > > > > > > On 24 Jul 2020, at 07:30, Garren Smith <gar...@apache.org> wrote: > > > > > > > > > > We haven't spoken much about updates with ebtree. > > > > > From my understanding of ebtree it can only do single writer, is that > > > > > correct?. If that is true it means we would not be able to > > parallelize > > > > the > > > > > background building of views to speed up view builds. > > > > > The other thing that would mean is that we cannot use it for mango > > where > > > > we > > > > > update the view in the doc transaction. > > > > > > > > > > Cheers > > > > > Garren > > > > > > > > > > On Fri, Jul 24, 2020 at 2:35 AM Kyle Snavely <kjsnav...@gmail.com> > > > > wrote: > > > > > > > > > >> When in doubt, throw us a build at Cloudant with ebtree maps and > > we'll > > > > see > > > > >> if it comes close to the crazy fast KV map queries. > > > > >> > > > > >> Kyle > > > > >> > > > > >> On Thu, Jul 23, 2020, 2:17 PM Robert Samuel Newson < > > rnew...@apache.org> > > > > >> wrote: > > > > >> > > > > >>> I (perhaps obviously) don't agree that I'm tying myself to old > > CouchDB > > > > or > > > > >>> failing to embrace FDB. FDB is a toolkit and does not, to my mind, > > > > force > > > > >> us > > > > >>> down any particular path. > > > > >>> > > > > >>> I haven't sat down to modify couch_views in the manner I've > > suggested > > > > >>> (where ebtree is used as designed; being fed the output of the > > emit() > > > > >> calls > > > > >>> and calculating reductions as it does so) but I think it's a > > worthwhile > > > > >>> exercise. I'd be surprised if performance of map-only traversals > > would > > > > be > > > > >>> disappointing but who knows? I also expect it would allow for > > > > significant > > > > >>> simplification of the code, which is one of the higher virtues. > > > > >>> > > > > >>> Adam, can you describe in a little more detail how you picture > > "b+tree > > > > is > > > > >>> only used for incremental aggregations,"? It's implied in your > > reply > > > > that > > > > >>> it would preserve the "interesting property" of keeping user data > > out > > > > of > > > > >>> FDB Keys (for casual readers: the new native database encryption, > > > > called > > > > >>> "aegis", only encrypts the FDB value. It can't encrypt the key as > > this > > > > >>> would change the order of keys, which the current code depends on). > > > > Did I > > > > >>> misread you? > > > > >>> > > > > >>> B. > > > > >>> > > > > >>>> On 23 Jul 2020, at 20:11, Adam Kocoloski <kocol...@apache.org> > > wrote: > > > > >>>> > > > > >>>> OK thanks for the clarification. As I said I wasn’t all that > > confident > > > > >> I > > > > >>> understood the design :) > > > > >>>> > > > > >>>> I like the idea that the b+tree is only used for incremental > > > > >>> aggregations, rather than storing the entire materialized view, > > for the > > > > >>> same reasons that Garren stated. > > > > >>>> > > > > >>>> An index maintained entirely in ebtree has the interesting > > property > > > > >> that > > > > >>> it does not leak any user data into FDB Keys, which could be > > attractive > > > > >> for > > > > >>> security reasons. > > > > >>>> > > > > >>>> Adam > > > > >>>> > > > > >>>>> On Jul 23, 2020, at 1:54 PM, Garren Smith <gar...@apache.org> > > wrote: > > > > >>>>> > > > > >>>>> On Thu, Jul 23, 2020 at 6:55 PM Paul Davis < > > > > >> paul.joseph.da...@gmail.com > > > > >>>> > > > > >>>>> wrote: > > > > >>>>> > > > > >>>>>>> I would like to keep ebtree to use just for the reduce index. > > > > >>>>>> > > > > >>>>>> Could you expand on your reasoning here, Garren? I haven't done > > any > > > > >>>>>> experiments on my own to understand if I'm missing something > > > > >>>>>> important. My initial reaction is to not diverge too far from > > the > > > > >>>>>> previous shape of the implementation since we have a decent > > idea of > > > > >>>>>> how that behaves already but perhaps you've seen or measured > > > > >> something > > > > >>>>>> I'm not thinking of? > > > > >>>>>> > > > > >>>>> > > > > >>>>> I think this must have been a misunderstanding on my part. I > > always > > > > >>> thought > > > > >>>>> of using ebtree to solve reduce and wasn't planning to use it > > for the > > > > >>> map > > > > >>>>> index. > > > > >>>>> I don't like the idea that we have ordered distributed key/value > > > > store > > > > >>> and > > > > >>>>> then implementing a b-tree on top of that for indexing. > > Especially > > > > >>> since we > > > > >>>>> know that the current map index is fast, > > > > >>>>> easy to use, and follows recommended practices in the FDB > > community > > > > on > > > > >>> how > > > > >>>>> to do indexing. ebtree makes sense for reduce and is a means to > > an > > > > end > > > > >>> to > > > > >>>>> give us CouchDB's reduce api, which is heavily reliant on a > > b-tree, > > > > >> with > > > > >>>>> CouchDB on FDB. This feels like a step backward and I worry we > > are > > > > >> tying > > > > >>>>> ourselves heavily to old CouchDB instead of using the fact that > > we > > > > >>> moving > > > > >>>>> to FDB which then allows us to design new api's and > > functionality. > > > > >>>> > > > > >>> > > > > >>> > > > > >> > > > > > > > > > > > > > >