Re: [DISCUSS] Reduce on FDB take 3

Paul Davis Mon, 27 Jul 2020 07:35:58 -0700

In terms of collation there will be differences in some edge cases.
Numbers in particular are all converted to floats which means that
extreme values that don't fit into a float (though Erlang floats are
doubles) will have a truncation that reduces resolution. Other than
that I'm pretty sure it'd be the same. There might be some issues
around the ICU library used for sort keys although both
implementations would suffer from issues there, but I think it'd be
slightly different issues.


Paul

On Mon, Jul 27, 2020 at 2:28 AM Garren Smith <[email protected]> wrote:
>
> Hi Bob,
>
> That makes sense.
>
> Cheers
> Garren
>
> On Sun, Jul 26, 2020 at 2:57 PM Robert Newson <[email protected]> wrote:
>
> >
> > Both approaches will collate keys the same way, it’s a showstopping bug if
> > they don’t match existing couch collation order.
> >
> > I’d probably not even document that right now. We can document what aegis
> > does and doesn’t cover when we make the first 4.0 RC.
> > --
> >   Robert Samuel Newson
> >   [email protected]
> >
> > On Sun, 26 Jul 2020, at 13:27, Jan Lehnardt wrote:
> > > Heya Bob,
> > >
> > > this sounds like a good path forward that allows us to validate the
> > > remaining pieces of this.
> > >
> > > Just two side notes:
> > >
> > > - do we expect that the map-keys-in-fdb approach would sort differently
> > > from map-keys-in-ebtree? (I assume no)
> > > - I further assume that we’ll find a nicer UX for “make sure my views
> > > are encrypted” than “make sure your view has a reduce fun”, no details
> > > or drafts necessary at this point.
> > >
> > > Best
> > > Jan
> > > —
> > >
> > > > On 26. Jul 2020, at 14:13, Robert Samuel Newson <[email protected]>
> > wrote:
> > > >
> > > > I'd like to find consensus on a path forward and, with that in mind, I
> > have a suggestion.
> > > >
> > > > Before that, I think it's useful to lay out the points of contention,
> > which I've ranked in I hope an uncontroversial order;
> > > >
> > > > The principal concern with using ebtree for map as well as reduce is
> > that it precludes the use of foundationdb's ordering, on which much
> > performance and scalability likely hinges. I agree with this on principle
> > (that is, its demonstrably true) though none of us yet know the extent to
> > which it matters in practice, or whether any gap could be closed with
> > optimization.
> > > >
> > > > The secondary concern is the proposed use of ebtree for reduce-only
> > where it receives only the reduce values from a document. I understand
> > garren's stated desire to avoid duplication with the map side but I don't
> > think it works out. Users of the reduce function are using it to get the
> > reductions _across_ documents. It is rare (that is, I don't recall seeing
> > an example by a cloudant customer) to emit multiple rows in a map function
> > for the same key. In the cases where it doesn't, the value inserted into
> > ebtree is the same as the map value anyway. Where it does happen, we're
> > still duplicating the key and _a_ document level value. It seems a very
> > modest saving of space for a more complicated design (two separate levels
> > of reduce calculated in two separate modules).
> > > >
> > > > The tertiary concern is that the map-only indexes place user data
> > outside of aegis (native database encryption) as, necessarily, the order of
> > keys is important and a straightforward encryption there would not preserve
> > it.
> > > >
> > > > Looking at all the options available I don't think there is a single
> > path that addresses everything. There are also significant unknowns around
> > practical performance for all of this.
> > > >
> > > > So, I suggest that the map-only code (specifically, the code we will
> > use for an index definition that does not specify a reduce function) uses
> > the existing couch_views approach, where the fdb keys are suffixed with the
> > emitted key from the map function, and the fdb values are the emitted
> > values. This vertical index should support the fastest ?key= and
> > startkey/endkey= queries that fdb is capable of. There might be
> > simplifications we can make to this code if we accept that it only needs to
> > handle map-only indexes.
> > > >
> > > > For index definitions that _do_ specify a reduce function, I suggest
> > this is done entirely with ebtree.We would call ebtree:insert(Db, Tree,
> > Key, Value) where Key and Value are the emitted key and value from the map
> > function. ebtree:open is handed a reduce_fun which handles the reduce and
> > rereduce for the users reduction and, like couchdb 1/2/3, we also mix in a
> > system reduction that calculates the view size (paul's diff above appears
> > to do just this portion and looks broadly right).
> > > >
> > > > This avoids the duplication that garren's idea of storing the reduces
> > in ebtree was trying to avoid, but does so in all cases. This approach
> > allows us to compare the performance of map queries against map-only and
> > map-reduce indexes, it allows users to opt in or out of encrypting their
> > emitted view keys, and it doesn't prevent anyone from choosing to do both
> > (You can define a view with map function A and another view with map
> > function A and reduce function B, the former will be the couch_views
> > vertical index, the latter an ebtree "horizontal" index).
> > > >
> > > > As we learn more over time we could enhance one type of index or the
> > other and reach a point where we eliminate one, or the two could coexist
> > indefinitely.
> > > >
> > > >> On 24 Jul 2020, at 20:00, Paul Davis <[email protected]>
> > wrote:
> > > >>
> > > >> FWIW, a first pass at views entirely on ebtree turned out to be fairly
> > > >> straightforward. Almost surprisingly simple in some cases.
> > > >>
> > > >>
> > https://github.com/apache/couchdb/compare/prototype/fdb-layer...prototype/fdb-layer-ebtree-views
> > > >>
> > > >> Its currently passing all tests in `couch_views_map_test.erl` but is
> > > >> failing on other suites. I only did a quick skim on the failures but
> > > >> they all look superficial around some APIs I changed.
> > > >>
> > > >> I haven't added the APIs to query reduce functions via HTTP but the
> > > >> reduce functions are being executed to calculate row counts and KV
> > > >> sizes. Adding the builtin reduce functions and extending those to user
> > > >> defined reduce functions should be straightforward.
> > > >>
> > > >> On Fri, Jul 24, 2020 at 9:39 AM Robert Newson <[email protected]>
> > wrote:
> > > >>>
> > > >>>
> > > >>> I’m happy to restrict my PR comments to the actual diff, yes. So I’m
> > not +1 yet.
> > > >>>
> > > >>> I fixed the spurious conflicts at
> > https://github.com/apache/couchdb/pull/3033.
> > > >>>
> > > >>> --
> > > >>> Robert Samuel Newson
> > > >>> [email protected]
> > > >>>
> > > >>> On Fri, 24 Jul 2020, at 14:59, Garren Smith wrote:
> > > >>>> Ok so just to confirm, we keep my PR as-is with ebtree only for
> > reduce. We
> > > >>>> can get that ready to merge into fdb master. We can then use that
> > to battle
> > > >>>> test ebtree and then look at using it for the map side as well. At
> > that
> > > >>>> point we would combine the reduce and map index into a ebtree
> > index. Are
> > > >>>> you happy with that?
> > > >>>>
> > > >>>> Cheers
> > > >>>> Garren
> > > >>>>
> > > >>>> On Fri, Jul 24, 2020 at 3:48 PM Robert Newson <[email protected]>
> > wrote:
> > > >>>>
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>> It’s not as unknown as you think but certainly we need empirical
> > data to
> > > >>>>> guide us on the reduce side. I’m also fine with continuing with the
> > > >>>>> map-only code as it stands today until such time as we demonstrate
> > ebtree
> > > >>>>> meets or exceeds our needs (and I freely accept the possibility
> > that it
> > > >>>>> might not).
> > > >>>>>
> > > >>>>> I think the principal enhancement to ebtree that would address most
> > > >>>>> concerns is if it could store the leaf entries vertically (as they
> > are
> > > >>>>> currently). I have some thoughts which I’ll try to realise as
> > working code.
> > > >>>>>
> > > >>>>> I’ve confirmed that I do create spurious conflicts and will have a
> > PR up
> > > >>>>> today to fix that.
> > > >>>>>
> > > >>>>> --
> > > >>>>> Robert Samuel Newson
> > > >>>>> [email protected]
> > > >>>>>
> > > >>>>> On Fri, 24 Jul 2020, at 12:43, Garren Smith wrote:
> > > >>>>>> Hi Bob,
> > > >>>>>>
> > > >>>>>> Thanks for that explanation, that is really helpful and it is
> > good we
> > > >>>>> have
> > > >>>>>> some options but it also does highlight a lot of unknowns.
> > Whereas our
> > > >>>>>> current map indexes are really simple and we know its behaviour.
> > There
> > > >>>>> are
> > > >>>>>> no real unknowns. Adding ebtree here could make map indexes a
> > fair bit
> > > >>>>> more
> > > >>>>>> complicated since we don't know the effect of managing different
> > node
> > > >>>>>> sizes, concurrent doc updates, and querying performance.
> > > >>>>>>
> > > >>>>>> Could we do a compromise here, could we look at using ebtree only
> > for
> > > >>>>>> reduces now? That means that we can have reduce indexes working
> > quite
> > > >>>>> soon
> > > >>>>>> on CouchDB on FDB. At the same time, we can work on ebtree and run
> > > >>>>>> performance tests on ebtree for map indexes. Some interesting
> > tests we
> > > >>>>> can
> > > >>>>>> do is see if a user emits KV's near the limits (8KB for keys and
> > 50KB for
> > > >>>>>> values) how does ebtree handle that? How it handles being updated
> > in the
> > > >>>>>> doc update transaction. And general query performance. What do
> > you think?
> > > >>>>>>
> > > >>>>>> Cheers
> > > >>>>>> Garren
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Fri, Jul 24, 2020 at 10:06 AM Robert Samuel Newson <
> > > >>>>> [email protected]>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Hi,
> > > >>>>>>>
> > > >>>>>>> A short preface at this stage is needed I think:
> > > >>>>>>>
> > > >>>>>>> My goal with ebtree was to implement a complete and correct
> > b+tree that
> > > >>>>>>> also calculated and maintained inner reductions. I consciously
> > chose
> > > >>>>> not to
> > > >>>>>>> go further than that before presenting it to the group for wider
> > debate
> > > >>>>>>> (indeed, the very debate we're having in this thread).
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>>
> > > >>>>>>> Ebtree is not single writer, at least not inherently. Two
> > updates to
> > > >>>>> the
> > > >>>>>>> same ebtree should both succeed as long as they don't modify the
> > same
> > > >>>>> nodes.
> > > >>>>>>>
> > > >>>>>>> Modifying the same node is likely where there's a reduce
> > function,
> > > >>>>> though
> > > >>>>>>> only if the reduction value actually changes, as that percolates
> > up the
> > > >>>>>>> tree. Where the reduce value does not change and when neither
> > > >>>>> transaction
> > > >>>>>>> causes a split, rebalance, or merge that affects the other, they
> > should
> > > >>>>>>> both commit without conflict. There is a rich history of
> > optimizations
> > > >>>>> in
> > > >>>>>>> this space specifically around btrees. ebtree might cause
> > spurious
> > > >>>>>>> conflicts today, I will investigate and propose fixes if so
> > (e.g, I
> > > >>>>> think I
> > > >>>>>>> probably call erlfdb:set on nodes that have not changed).
> > > >>>>>>>
> > > >>>>>>> I certainly envisaged updating ebtree within the same txn as a
> > doc
> > > >>>>> update,
> > > >>>>>>> which is why the first argument to all the public functions can
> > be
> > > >>>>> either
> > > >>>>>>> an erlfdb Db or open Tx.
> > > >>>>>>>
> > > >>>>>>> Parallelising the initial build is more difficult with ebtree
> > than
> > > >>>>>>> parallelising the existing couch_views map-only code, though
> > it's worth
> > > >>>>>>> noting that ebtree:insert benefits a great deal from batching
> > (multiple
> > > >>>>>>> calls to :insert in the same transaction). For an offline build
> > (i.e,
> > > >>>>> an
> > > >>>>>>> index that the client cannot see until the entire build is
> > complete)
> > > >>>>> the
> > > >>>>>>> batch size can be maximised. That is still a serial process in
> > that
> > > >>>>> there
> > > >>>>>>> is only one transaction at a time updating the ebtree. I can't
> > say
> > > >>>>> offhand
> > > >>>>>>> how fast that is in practice, but it is clearly less powerful
> > than a
> > > >>>>> fully
> > > >>>>>>> parallelisable approach could be.
> > > >>>>>>>
> > > >>>>>>> Any parallel build would require a way to divide the database
> > into
> > > >>>>>>> non-overlapping subsets of emitted keys. This is easy and
> > natural if
> > > >>>>> the
> > > >>>>>>> fdb key is the emitted key, which is the case for the couch_views
> > > >>>>> map-only
> > > >>>>>>> code. For ebtree it might be enough to simply grab a large chunk
> > of
> > > >>>>>>> documents, perform the map transform, and then issues multiple
> > > >>>>> transactions
> > > >>>>>>> on subsets of those.
> > > >>>>>>>
> > > >>>>>>> Another common technique for btrees is bulk loading (more or less
> > > >>>>>>> literally constructing the btree nodes directly from the source,
> > as
> > > >>>>> long as
> > > >>>>>>> you can sort it), which might be an option as well.
> > > >>>>>>>
> > > >>>>>>> Parallelising a build _with_ a reduce function seems hard
> > however we do
> > > >>>>>>> it. The non-ebtree approach is parallelisable by virtue of
> > paring down
> > > >>>>> the
> > > >>>>>>> reduce functionality itself (only whole key groups, only those
> > > >>>>> functions
> > > >>>>>>> that fdb has atomic operations for).
> > > >>>>>>>
> > > >>>>>>> I will first of all verify the multi-writer nature of ebtree as
> > it
> > > >>>>> stands
> > > >>>>>>> today and make a PR which fixes any spurious conflicts, and then
> > ponder
> > > >>>>>>> further  how true parallel builds might be possible.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> On 24 Jul 2020, at 07:30, Garren Smith <[email protected]>
> > wrote:
> > > >>>>>>>>
> > > >>>>>>>> We haven't spoken much about updates with ebtree.
> > > >>>>>>>> From my understanding of ebtree it can only do single writer,
> > is that
> > > >>>>>>>> correct?. If that is true it means we would not be able to
> > > >>>>> parallelize
> > > >>>>>>> the
> > > >>>>>>>> background building of views to speed up view builds.
> > > >>>>>>>> The other thing that would mean is that we cannot use it for
> > mango
> > > >>>>> where
> > > >>>>>>> we
> > > >>>>>>>> update the view in the doc transaction.
> > > >>>>>>>>
> > > >>>>>>>> Cheers
> > > >>>>>>>> Garren
> > > >>>>>>>>
> > > >>>>>>>> On Fri, Jul 24, 2020 at 2:35 AM Kyle Snavely <
> > [email protected]>
> > > >>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> When in doubt, throw us a build at Cloudant with ebtree maps
> > and
> > > >>>>> we'll
> > > >>>>>>> see
> > > >>>>>>>>> if it comes close to the crazy fast KV map queries.
> > > >>>>>>>>>
> > > >>>>>>>>> Kyle
> > > >>>>>>>>>
> > > >>>>>>>>> On Thu, Jul 23, 2020, 2:17 PM Robert Samuel Newson <
> > > >>>>> [email protected]>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> I (perhaps obviously) don't agree that I'm tying myself to old
> > > >>>>> CouchDB
> > > >>>>>>> or
> > > >>>>>>>>>> failing to embrace FDB. FDB is a toolkit and does not, to my
> > mind,
> > > >>>>>>> force
> > > >>>>>>>>> us
> > > >>>>>>>>>> down any particular path.
> > > >>>>>>>>>>
> > > >>>>>>>>>> I haven't sat down to modify couch_views in the manner I've
> > > >>>>> suggested
> > > >>>>>>>>>> (where ebtree is used as designed; being fed the output of the
> > > >>>>> emit()
> > > >>>>>>>>> calls
> > > >>>>>>>>>> and calculating reductions as it does so) but I think it's a
> > > >>>>> worthwhile
> > > >>>>>>>>>> exercise. I'd be surprised if performance of map-only
> > traversals
> > > >>>>> would
> > > >>>>>>> be
> > > >>>>>>>>>> disappointing but who knows? I also expect it would allow for
> > > >>>>>>> significant
> > > >>>>>>>>>> simplification of the code, which is one of the higher
> > virtues.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Adam, can you describe in a little more detail how you picture
> > > >>>>> "b+tree
> > > >>>>>>> is
> > > >>>>>>>>>> only used for incremental aggregations,"? It's implied in your
> > > >>>>> reply
> > > >>>>>>> that
> > > >>>>>>>>>> it would preserve the "interesting property" of keeping user
> > data
> > > >>>>> out
> > > >>>>>>> of
> > > >>>>>>>>>> FDB Keys (for casual readers: the new native database
> > encryption,
> > > >>>>>>> called
> > > >>>>>>>>>> "aegis", only encrypts the FDB value. It can't encrypt the
> > key as
> > > >>>>> this
> > > >>>>>>>>>> would change the order of keys, which the current code
> > depends on).
> > > >>>>>>> Did I
> > > >>>>>>>>>> misread you?
> > > >>>>>>>>>>
> > > >>>>>>>>>> B.
> > > >>>>>>>>>>
> > > >>>>>>>>>>> On 23 Jul 2020, at 20:11, Adam Kocoloski <
> > [email protected]>
> > > >>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> OK thanks for the clarification. As I said I wasn’t all that
> > > >>>>> confident
> > > >>>>>>>>> I
> > > >>>>>>>>>> understood the design :)
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I like the idea that the b+tree is only used for incremental
> > > >>>>>>>>>> aggregations, rather than storing the entire materialized
> > view,
> > > >>>>> for the
> > > >>>>>>>>>> same reasons that Garren stated.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> An index maintained entirely in ebtree has the interesting
> > > >>>>> property
> > > >>>>>>>>> that
> > > >>>>>>>>>> it does not leak any user data into FDB Keys, which could be
> > > >>>>> attractive
> > > >>>>>>>>> for
> > > >>>>>>>>>> security reasons.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Adam
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> On Jul 23, 2020, at 1:54 PM, Garren Smith <
> > [email protected]>
> > > >>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Thu, Jul 23, 2020 at 6:55 PM Paul Davis <
> > > >>>>>>>>> [email protected]
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I would like to keep ebtree to use just for the reduce
> > index.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Could you expand on your reasoning here, Garren? I haven't
> > done
> > > >>>>> any
> > > >>>>>>>>>>>>> experiments on my own to understand if I'm missing
> > something
> > > >>>>>>>>>>>>> important. My initial reaction is to not diverge too far
> > from
> > > >>>>> the
> > > >>>>>>>>>>>>> previous shape of the implementation since we have a decent
> > > >>>>> idea of
> > > >>>>>>>>>>>>> how that behaves already but perhaps you've seen or
> > measured
> > > >>>>>>>>> something
> > > >>>>>>>>>>>>> I'm not thinking of?
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I think this must have been a misunderstanding on my part. I
> > > >>>>> always
> > > >>>>>>>>>> thought
> > > >>>>>>>>>>>> of using ebtree to solve reduce and wasn't planning to use
> > it
> > > >>>>> for the
> > > >>>>>>>>>> map
> > > >>>>>>>>>>>> index.
> > > >>>>>>>>>>>> I don't like the idea that we have ordered distributed
> > key/value
> > > >>>>>>> store
> > > >>>>>>>>>> and
> > > >>>>>>>>>>>> then implementing a b-tree on top of that for indexing.
> > > >>>>> Especially
> > > >>>>>>>>>> since we
> > > >>>>>>>>>>>> know that the current map index is fast,
> > > >>>>>>>>>>>> easy to use, and follows recommended practices in the FDB
> > > >>>>> community
> > > >>>>>>> on
> > > >>>>>>>>>> how
> > > >>>>>>>>>>>> to do indexing. ebtree makes sense for reduce and is a
> > means to
> > > >>>>> an
> > > >>>>>>> end
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>> give us CouchDB's reduce api, which is heavily reliant on a
> > > >>>>> b-tree,
> > > >>>>>>>>> with
> > > >>>>>>>>>>>> CouchDB on FDB. This feels like a step backward and I worry
> > we
> > > >>>>> are
> > > >>>>>>>>> tying
> > > >>>>>>>>>>>> ourselves heavily to old CouchDB instead of using the fact
> > that
> > > >>>>> we
> > > >>>>>>>>>> moving
> > > >>>>>>>>>>>> to FDB which then allows us to design new api's and
> > > >>>>> functionality.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >
> > >
> > >
> >

Re: [DISCUSS] Reduce on FDB take 3

Reply via email to