Re: [DISCUSS] Reduce on FDB take 3

Robert Newson Fri, 24 Jul 2020 06:50:30 -0700

Hi,

It’s not as unknown as you think but certainly we need empirical data to guide 
us on the reduce side. I’m also fine with continuing with the map-only code as 
it stands today until such time as we demonstrate ebtree meets or exceeds our 
needs (and I freely accept the possibility that it might not).


I think the principal enhancement to ebtree that would address most concerns is 
if it could store the leaf entries vertically (as they are currently). I have 
some thoughts which I’ll try to realise as working code. 

I’ve confirmed that I do create spurious conflicts and will have a PR up today 
to fix that. 

-- 
  Robert Samuel Newson
  [email protected]

On Fri, 24 Jul 2020, at 12:43, Garren Smith wrote:
> Hi Bob,
> 
> Thanks for that explanation, that is really helpful and it is good we have
> some options but it also does highlight a lot of unknowns. Whereas our
> current map indexes are really simple and we know its behaviour. There are
> no real unknowns. Adding ebtree here could make map indexes a fair bit more
> complicated since we don't know the effect of managing different node
> sizes, concurrent doc updates, and querying performance.
> 
> Could we do a compromise here, could we look at using ebtree only for
> reduces now? That means that we can have reduce indexes working quite soon
> on CouchDB on FDB. At the same time, we can work on ebtree and run
> performance tests on ebtree for map indexes. Some interesting tests we can
> do is see if a user emits KV's near the limits (8KB for keys and 50KB for
> values) how does ebtree handle that? How it handles being updated in the
> doc update transaction. And general query performance. What do you think?
> 
> Cheers
> Garren
> 
> 
> On Fri, Jul 24, 2020 at 10:06 AM Robert Samuel Newson <[email protected]>
> wrote:
> 
> > Hi,
> >
> > A short preface at this stage is needed I think:
> >
> > My goal with ebtree was to implement a complete and correct b+tree that
> > also calculated and maintained inner reductions. I consciously chose not to
> > go further than that before presenting it to the group for wider debate
> > (indeed, the very debate we're having in this thread).
> >
> > --
> >
> > Ebtree is not single writer, at least not inherently. Two updates to the
> > same ebtree should both succeed as long as they don't modify the same nodes.
> >
> > Modifying the same node is likely where there's a reduce function, though
> > only if the reduction value actually changes, as that percolates up the
> > tree. Where the reduce value does not change and when neither transaction
> > causes a split, rebalance, or merge that affects the other, they should
> > both commit without conflict. There is a rich history of optimizations in
> > this space specifically around btrees. ebtree might cause spurious
> > conflicts today, I will investigate and propose fixes if so (e.g, I think I
> > probably call erlfdb:set on nodes that have not changed).
> >
> > I certainly envisaged updating ebtree within the same txn as a doc update,
> > which is why the first argument to all the public functions can be either
> > an erlfdb Db or open Tx.
> >
> > Parallelising the initial build is more difficult with ebtree than
> > parallelising the existing couch_views map-only code, though it's worth
> > noting that ebtree:insert benefits a great deal from batching (multiple
> > calls to :insert in the same transaction). For an offline build (i.e, an
> > index that the client cannot see until the entire build is complete) the
> > batch size can be maximised. That is still a serial process in that there
> > is only one transaction at a time updating the ebtree. I can't say offhand
> > how fast that is in practice, but it is clearly less powerful than a fully
> > parallelisable approach could be.
> >
> > Any parallel build would require a way to divide the database into
> > non-overlapping subsets of emitted keys. This is easy and natural if the
> > fdb key is the emitted key, which is the case for the couch_views map-only
> > code. For ebtree it might be enough to simply grab a large chunk of
> > documents, perform the map transform, and then issues multiple transactions
> > on subsets of those.
> >
> > Another common technique for btrees is bulk loading (more or less
> > literally constructing the btree nodes directly from the source, as long as
> > you can sort it), which might be an option as well.
> >
> > Parallelising a build _with_ a reduce function seems hard however we do
> > it. The non-ebtree approach is parallelisable by virtue of paring down the
> > reduce functionality itself (only whole key groups, only those functions
> > that fdb has atomic operations for).
> >
> > I will first of all verify the multi-writer nature of ebtree as it stands
> > today and make a PR which fixes any spurious conflicts, and then ponder
> > further  how true parallel builds might be possible.
> >
> >
> > > On 24 Jul 2020, at 07:30, Garren Smith <[email protected]> wrote:
> > >
> > > We haven't spoken much about updates with ebtree.
> > > From my understanding of ebtree it can only do single writer, is that
> > > correct?. If that is true it means we would not be able to parallelize
> > the
> > > background building of views to speed up view builds.
> > > The other thing that would mean is that we cannot use it for mango where
> > we
> > > update the view in the doc transaction.
> > >
> > > Cheers
> > > Garren
> > >
> > > On Fri, Jul 24, 2020 at 2:35 AM Kyle Snavely <[email protected]>
> > wrote:
> > >
> > >> When in doubt, throw us a build at Cloudant with ebtree maps and we'll
> > see
> > >> if it comes close to the crazy fast KV map queries.
> > >>
> > >> Kyle
> > >>
> > >> On Thu, Jul 23, 2020, 2:17 PM Robert Samuel Newson <[email protected]>
> > >> wrote:
> > >>
> > >>> I (perhaps obviously) don't agree that I'm tying myself to old CouchDB
> > or
> > >>> failing to embrace FDB. FDB is a toolkit and does not, to my mind,
> > force
> > >> us
> > >>> down any particular path.
> > >>>
> > >>> I haven't sat down to modify couch_views in the manner I've suggested
> > >>> (where ebtree is used as designed; being fed the output of the emit()
> > >> calls
> > >>> and calculating reductions as it does so) but I think it's a worthwhile
> > >>> exercise. I'd be surprised if performance of map-only traversals would
> > be
> > >>> disappointing but who knows? I also expect it would allow for
> > significant
> > >>> simplification of the code, which is one of the higher virtues.
> > >>>
> > >>> Adam, can you describe in a little more detail how you picture "b+tree
> > is
> > >>> only used for incremental aggregations,"? It's implied in your reply
> > that
> > >>> it would preserve the "interesting property" of keeping user data out
> > of
> > >>> FDB Keys (for casual readers: the new native database encryption,
> > called
> > >>> "aegis", only encrypts the FDB value. It can't encrypt the key as this
> > >>> would change the order of keys, which the current code depends on).
> > Did I
> > >>> misread you?
> > >>>
> > >>> B.
> > >>>
> > >>>> On 23 Jul 2020, at 20:11, Adam Kocoloski <[email protected]> wrote:
> > >>>>
> > >>>> OK thanks for the clarification. As I said I wasn’t all that confident
> > >> I
> > >>> understood the design :)
> > >>>>
> > >>>> I like the idea that the b+tree is only used for incremental
> > >>> aggregations, rather than storing the entire materialized view, for the
> > >>> same reasons that Garren stated.
> > >>>>
> > >>>> An index maintained entirely in ebtree has the interesting property
> > >> that
> > >>> it does not leak any user data into FDB Keys, which could be attractive
> > >> for
> > >>> security reasons.
> > >>>>
> > >>>> Adam
> > >>>>
> > >>>>> On Jul 23, 2020, at 1:54 PM, Garren Smith <[email protected]> wrote:
> > >>>>>
> > >>>>> On Thu, Jul 23, 2020 at 6:55 PM Paul Davis <
> > >> [email protected]
> > >>>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>>> I would like to keep ebtree to use just for the reduce index.
> > >>>>>>
> > >>>>>> Could you expand on your reasoning here, Garren? I haven't done any
> > >>>>>> experiments on my own to understand if I'm missing something
> > >>>>>> important. My initial reaction is to not diverge too far from the
> > >>>>>> previous shape of the implementation since we have a decent idea of
> > >>>>>> how that behaves already but perhaps you've seen or measured
> > >> something
> > >>>>>> I'm not thinking of?
> > >>>>>>
> > >>>>>
> > >>>>> I think this must have been a misunderstanding on my part. I always
> > >>> thought
> > >>>>> of using ebtree to solve reduce and wasn't planning to use it for the
> > >>> map
> > >>>>> index.
> > >>>>> I don't like the idea that we have ordered distributed key/value
> > store
> > >>> and
> > >>>>> then implementing a b-tree on top of that for indexing. Especially
> > >>> since we
> > >>>>> know that the current map index is fast,
> > >>>>> easy to use, and follows recommended practices in the FDB community
> > on
> > >>> how
> > >>>>> to do indexing. ebtree makes sense for reduce and is a means to an
> > end
> > >>> to
> > >>>>> give us CouchDB's reduce api, which is heavily reliant on a b-tree,
> > >> with
> > >>>>> CouchDB on FDB. This feels like a step backward and I worry we are
> > >> tying
> > >>>>> ourselves heavily to old CouchDB instead of using the fact that we
> > >>> moving
> > >>>>> to FDB which then allows us to design new api's and functionality.
> > >>>>
> > >>>
> > >>>
> > >>
> >
> >
>

Re: [DISCUSS] Reduce on FDB take 3

Reply via email to