Hi, It’s not as unknown as you think but certainly we need empirical data to guide us on the reduce side. I’m also fine with continuing with the map-only code as it stands today until such time as we demonstrate ebtree meets or exceeds our needs (and I freely accept the possibility that it might not).
I think the principal enhancement to ebtree that would address most concerns is if it could store the leaf entries vertically (as they are currently). I have some thoughts which I’ll try to realise as working code. I’ve confirmed that I do create spurious conflicts and will have a PR up today to fix that. -- Robert Samuel Newson rnew...@apache.org On Fri, 24 Jul 2020, at 12:43, Garren Smith wrote: > Hi Bob, > > Thanks for that explanation, that is really helpful and it is good we have > some options but it also does highlight a lot of unknowns. Whereas our > current map indexes are really simple and we know its behaviour. There are > no real unknowns. Adding ebtree here could make map indexes a fair bit more > complicated since we don't know the effect of managing different node > sizes, concurrent doc updates, and querying performance. > > Could we do a compromise here, could we look at using ebtree only for > reduces now? That means that we can have reduce indexes working quite soon > on CouchDB on FDB. At the same time, we can work on ebtree and run > performance tests on ebtree for map indexes. Some interesting tests we can > do is see if a user emits KV's near the limits (8KB for keys and 50KB for > values) how does ebtree handle that? How it handles being updated in the > doc update transaction. And general query performance. What do you think? > > Cheers > Garren > > > On Fri, Jul 24, 2020 at 10:06 AM Robert Samuel Newson <rnew...@apache.org> > wrote: > > > Hi, > > > > A short preface at this stage is needed I think: > > > > My goal with ebtree was to implement a complete and correct b+tree that > > also calculated and maintained inner reductions. I consciously chose not to > > go further than that before presenting it to the group for wider debate > > (indeed, the very debate we're having in this thread). > > > > -- > > > > Ebtree is not single writer, at least not inherently. Two updates to the > > same ebtree should both succeed as long as they don't modify the same nodes. > > > > Modifying the same node is likely where there's a reduce function, though > > only if the reduction value actually changes, as that percolates up the > > tree. Where the reduce value does not change and when neither transaction > > causes a split, rebalance, or merge that affects the other, they should > > both commit without conflict. There is a rich history of optimizations in > > this space specifically around btrees. ebtree might cause spurious > > conflicts today, I will investigate and propose fixes if so (e.g, I think I > > probably call erlfdb:set on nodes that have not changed). > > > > I certainly envisaged updating ebtree within the same txn as a doc update, > > which is why the first argument to all the public functions can be either > > an erlfdb Db or open Tx. > > > > Parallelising the initial build is more difficult with ebtree than > > parallelising the existing couch_views map-only code, though it's worth > > noting that ebtree:insert benefits a great deal from batching (multiple > > calls to :insert in the same transaction). For an offline build (i.e, an > > index that the client cannot see until the entire build is complete) the > > batch size can be maximised. That is still a serial process in that there > > is only one transaction at a time updating the ebtree. I can't say offhand > > how fast that is in practice, but it is clearly less powerful than a fully > > parallelisable approach could be. > > > > Any parallel build would require a way to divide the database into > > non-overlapping subsets of emitted keys. This is easy and natural if the > > fdb key is the emitted key, which is the case for the couch_views map-only > > code. For ebtree it might be enough to simply grab a large chunk of > > documents, perform the map transform, and then issues multiple transactions > > on subsets of those. > > > > Another common technique for btrees is bulk loading (more or less > > literally constructing the btree nodes directly from the source, as long as > > you can sort it), which might be an option as well. > > > > Parallelising a build _with_ a reduce function seems hard however we do > > it. The non-ebtree approach is parallelisable by virtue of paring down the > > reduce functionality itself (only whole key groups, only those functions > > that fdb has atomic operations for). > > > > I will first of all verify the multi-writer nature of ebtree as it stands > > today and make a PR which fixes any spurious conflicts, and then ponder > > further how true parallel builds might be possible. > > > > > > > On 24 Jul 2020, at 07:30, Garren Smith <gar...@apache.org> wrote: > > > > > > We haven't spoken much about updates with ebtree. > > > From my understanding of ebtree it can only do single writer, is that > > > correct?. If that is true it means we would not be able to parallelize > > the > > > background building of views to speed up view builds. > > > The other thing that would mean is that we cannot use it for mango where > > we > > > update the view in the doc transaction. > > > > > > Cheers > > > Garren > > > > > > On Fri, Jul 24, 2020 at 2:35 AM Kyle Snavely <kjsnav...@gmail.com> > > wrote: > > > > > >> When in doubt, throw us a build at Cloudant with ebtree maps and we'll > > see > > >> if it comes close to the crazy fast KV map queries. > > >> > > >> Kyle > > >> > > >> On Thu, Jul 23, 2020, 2:17 PM Robert Samuel Newson <rnew...@apache.org> > > >> wrote: > > >> > > >>> I (perhaps obviously) don't agree that I'm tying myself to old CouchDB > > or > > >>> failing to embrace FDB. FDB is a toolkit and does not, to my mind, > > force > > >> us > > >>> down any particular path. > > >>> > > >>> I haven't sat down to modify couch_views in the manner I've suggested > > >>> (where ebtree is used as designed; being fed the output of the emit() > > >> calls > > >>> and calculating reductions as it does so) but I think it's a worthwhile > > >>> exercise. I'd be surprised if performance of map-only traversals would > > be > > >>> disappointing but who knows? I also expect it would allow for > > significant > > >>> simplification of the code, which is one of the higher virtues. > > >>> > > >>> Adam, can you describe in a little more detail how you picture "b+tree > > is > > >>> only used for incremental aggregations,"? It's implied in your reply > > that > > >>> it would preserve the "interesting property" of keeping user data out > > of > > >>> FDB Keys (for casual readers: the new native database encryption, > > called > > >>> "aegis", only encrypts the FDB value. It can't encrypt the key as this > > >>> would change the order of keys, which the current code depends on). > > Did I > > >>> misread you? > > >>> > > >>> B. > > >>> > > >>>> On 23 Jul 2020, at 20:11, Adam Kocoloski <kocol...@apache.org> wrote: > > >>>> > > >>>> OK thanks for the clarification. As I said I wasn’t all that confident > > >> I > > >>> understood the design :) > > >>>> > > >>>> I like the idea that the b+tree is only used for incremental > > >>> aggregations, rather than storing the entire materialized view, for the > > >>> same reasons that Garren stated. > > >>>> > > >>>> An index maintained entirely in ebtree has the interesting property > > >> that > > >>> it does not leak any user data into FDB Keys, which could be attractive > > >> for > > >>> security reasons. > > >>>> > > >>>> Adam > > >>>> > > >>>>> On Jul 23, 2020, at 1:54 PM, Garren Smith <gar...@apache.org> wrote: > > >>>>> > > >>>>> On Thu, Jul 23, 2020 at 6:55 PM Paul Davis < > > >> paul.joseph.da...@gmail.com > > >>>> > > >>>>> wrote: > > >>>>> > > >>>>>>> I would like to keep ebtree to use just for the reduce index. > > >>>>>> > > >>>>>> Could you expand on your reasoning here, Garren? I haven't done any > > >>>>>> experiments on my own to understand if I'm missing something > > >>>>>> important. My initial reaction is to not diverge too far from the > > >>>>>> previous shape of the implementation since we have a decent idea of > > >>>>>> how that behaves already but perhaps you've seen or measured > > >> something > > >>>>>> I'm not thinking of? > > >>>>>> > > >>>>> > > >>>>> I think this must have been a misunderstanding on my part. I always > > >>> thought > > >>>>> of using ebtree to solve reduce and wasn't planning to use it for the > > >>> map > > >>>>> index. > > >>>>> I don't like the idea that we have ordered distributed key/value > > store > > >>> and > > >>>>> then implementing a b-tree on top of that for indexing. Especially > > >>> since we > > >>>>> know that the current map index is fast, > > >>>>> easy to use, and follows recommended practices in the FDB community > > on > > >>> how > > >>>>> to do indexing. ebtree makes sense for reduce and is a means to an > > end > > >>> to > > >>>>> give us CouchDB's reduce api, which is heavily reliant on a b-tree, > > >> with > > >>>>> CouchDB on FDB. This feels like a step backward and I worry we are > > >> tying > > >>>>> ourselves heavily to old CouchDB instead of using the fact that we > > >>> moving > > >>>>> to FDB which then allows us to design new api's and functionality. > > >>>> > > >>> > > >>> > > >> > > > > >