Re: [DISCUSS] Reduce on FDB take 3

Garren Smith Mon, 27 Jul 2020 00:26:22 -0700

Hi Bob,

That makes sense.


Cheers
Garren

On Sun, Jul 26, 2020 at 2:57 PM Robert Newson <rnew...@apache.org> wrote:

>
> Both approaches will collate keys the same way, it’s a showstopping bug if
> they don’t match existing couch collation order.
>
> I’d probably not even document that right now. We can document what aegis
> does and doesn’t cover when we make the first 4.0 RC.
> --
>   Robert Samuel Newson
>   rnew...@apache.org
>
> On Sun, 26 Jul 2020, at 13:27, Jan Lehnardt wrote:
> > Heya Bob,
> >
> > this sounds like a good path forward that allows us to validate the
> > remaining pieces of this.
> >
> > Just two side notes:
> >
> > - do we expect that the map-keys-in-fdb approach would sort differently
> > from map-keys-in-ebtree? (I assume no)
> > - I further assume that we’ll find a nicer UX for “make sure my views
> > are encrypted” than “make sure your view has a reduce fun”, no details
> > or drafts necessary at this point.
> >
> > Best
> > Jan
> > —
> >
> > > On 26. Jul 2020, at 14:13, Robert Samuel Newson <rnew...@apache.org>
> wrote:
> > >
> > > I'd like to find consensus on a path forward and, with that in mind, I
> have a suggestion.
> > >
> > > Before that, I think it's useful to lay out the points of contention,
> which I've ranked in I hope an uncontroversial order;
> > >
> > > The principal concern with using ebtree for map as well as reduce is
> that it precludes the use of foundationdb's ordering, on which much
> performance and scalability likely hinges. I agree with this on principle
> (that is, its demonstrably true) though none of us yet know the extent to
> which it matters in practice, or whether any gap could be closed with
> optimization.
> > >
> > > The secondary concern is the proposed use of ebtree for reduce-only
> where it receives only the reduce values from a document. I understand
> garren's stated desire to avoid duplication with the map side but I don't
> think it works out. Users of the reduce function are using it to get the
> reductions _across_ documents. It is rare (that is, I don't recall seeing
> an example by a cloudant customer) to emit multiple rows in a map function
> for the same key. In the cases where it doesn't, the value inserted into
> ebtree is the same as the map value anyway. Where it does happen, we're
> still duplicating the key and _a_ document level value. It seems a very
> modest saving of space for a more complicated design (two separate levels
> of reduce calculated in two separate modules).
> > >
> > > The tertiary concern is that the map-only indexes place user data
> outside of aegis (native database encryption) as, necessarily, the order of
> keys is important and a straightforward encryption there would not preserve
> it.
> > >
> > > Looking at all the options available I don't think there is a single
> path that addresses everything. There are also significant unknowns around
> practical performance for all of this.
> > >
> > > So, I suggest that the map-only code (specifically, the code we will
> use for an index definition that does not specify a reduce function) uses
> the existing couch_views approach, where the fdb keys are suffixed with the
> emitted key from the map function, and the fdb values are the emitted
> values. This vertical index should support the fastest ?key= and
> startkey/endkey= queries that fdb is capable of. There might be
> simplifications we can make to this code if we accept that it only needs to
> handle map-only indexes.
> > >
> > > For index definitions that _do_ specify a reduce function, I suggest
> this is done entirely with ebtree.We would call ebtree:insert(Db, Tree,
> Key, Value) where Key and Value are the emitted key and value from the map
> function. ebtree:open is handed a reduce_fun which handles the reduce and
> rereduce for the users reduction and, like couchdb 1/2/3, we also mix in a
> system reduction that calculates the view size (paul's diff above appears
> to do just this portion and looks broadly right).
> > >
> > > This avoids the duplication that garren's idea of storing the reduces
> in ebtree was trying to avoid, but does so in all cases. This approach
> allows us to compare the performance of map queries against map-only and
> map-reduce indexes, it allows users to opt in or out of encrypting their
> emitted view keys, and it doesn't prevent anyone from choosing to do both
> (You can define a view with map function A and another view with map
> function A and reduce function B, the former will be the couch_views
> vertical index, the latter an ebtree "horizontal" index).
> > >
> > > As we learn more over time we could enhance one type of index or the
> other and reach a point where we eliminate one, or the two could coexist
> indefinitely.
> > >
> > >> On 24 Jul 2020, at 20:00, Paul Davis <paul.joseph.da...@gmail.com>
> wrote:
> > >>
> > >> FWIW, a first pass at views entirely on ebtree turned out to be fairly
> > >> straightforward. Almost surprisingly simple in some cases.
> > >>
> > >>
> https://github.com/apache/couchdb/compare/prototype/fdb-layer...prototype/fdb-layer-ebtree-views
> > >>
> > >> Its currently passing all tests in `couch_views_map_test.erl` but is
> > >> failing on other suites. I only did a quick skim on the failures but
> > >> they all look superficial around some APIs I changed.
> > >>
> > >> I haven't added the APIs to query reduce functions via HTTP but the
> > >> reduce functions are being executed to calculate row counts and KV
> > >> sizes. Adding the builtin reduce functions and extending those to user
> > >> defined reduce functions should be straightforward.
> > >>
> > >> On Fri, Jul 24, 2020 at 9:39 AM Robert Newson <rnew...@apache.org>
> wrote:
> > >>>
> > >>>
> > >>> I’m happy to restrict my PR comments to the actual diff, yes. So I’m
> not +1 yet.
> > >>>
> > >>> I fixed the spurious conflicts at
> https://github.com/apache/couchdb/pull/3033.
> > >>>
> > >>> --
> > >>> Robert Samuel Newson
> > >>> rnew...@apache.org
> > >>>
> > >>> On Fri, 24 Jul 2020, at 14:59, Garren Smith wrote:
> > >>>> Ok so just to confirm, we keep my PR as-is with ebtree only for
> reduce. We
> > >>>> can get that ready to merge into fdb master. We can then use that
> to battle
> > >>>> test ebtree and then look at using it for the map side as well. At
> that
> > >>>> point we would combine the reduce and map index into a ebtree
> index. Are
> > >>>> you happy with that?
> > >>>>
> > >>>> Cheers
> > >>>> Garren
> > >>>>
> > >>>> On Fri, Jul 24, 2020 at 3:48 PM Robert Newson <rnew...@apache.org>
> wrote:
> > >>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> It’s not as unknown as you think but certainly we need empirical
> data to
> > >>>>> guide us on the reduce side. I’m also fine with continuing with the
> > >>>>> map-only code as it stands today until such time as we demonstrate
> ebtree
> > >>>>> meets or exceeds our needs (and I freely accept the possibility
> that it
> > >>>>> might not).
> > >>>>>
> > >>>>> I think the principal enhancement to ebtree that would address most
> > >>>>> concerns is if it could store the leaf entries vertically (as they
> are
> > >>>>> currently). I have some thoughts which I’ll try to realise as
> working code.
> > >>>>>
> > >>>>> I’ve confirmed that I do create spurious conflicts and will have a
> PR up
> > >>>>> today to fix that.
> > >>>>>
> > >>>>> --
> > >>>>> Robert Samuel Newson
> > >>>>> rnew...@apache.org
> > >>>>>
> > >>>>> On Fri, 24 Jul 2020, at 12:43, Garren Smith wrote:
> > >>>>>> Hi Bob,
> > >>>>>>
> > >>>>>> Thanks for that explanation, that is really helpful and it is
> good we
> > >>>>> have
> > >>>>>> some options but it also does highlight a lot of unknowns.
> Whereas our
> > >>>>>> current map indexes are really simple and we know its behaviour.
> There
> > >>>>> are
> > >>>>>> no real unknowns. Adding ebtree here could make map indexes a
> fair bit
> > >>>>> more
> > >>>>>> complicated since we don't know the effect of managing different
> node
> > >>>>>> sizes, concurrent doc updates, and querying performance.
> > >>>>>>
> > >>>>>> Could we do a compromise here, could we look at using ebtree only
> for
> > >>>>>> reduces now? That means that we can have reduce indexes working
> quite
> > >>>>> soon
> > >>>>>> on CouchDB on FDB. At the same time, we can work on ebtree and run
> > >>>>>> performance tests on ebtree for map indexes. Some interesting
> tests we
> > >>>>> can
> > >>>>>> do is see if a user emits KV's near the limits (8KB for keys and
> 50KB for
> > >>>>>> values) how does ebtree handle that? How it handles being updated
> in the
> > >>>>>> doc update transaction. And general query performance. What do
> you think?
> > >>>>>>
> > >>>>>> Cheers
> > >>>>>> Garren
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Jul 24, 2020 at 10:06 AM Robert Samuel Newson <
> > >>>>> rnew...@apache.org>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Hi,
> > >>>>>>>
> > >>>>>>> A short preface at this stage is needed I think:
> > >>>>>>>
> > >>>>>>> My goal with ebtree was to implement a complete and correct
> b+tree that
> > >>>>>>> also calculated and maintained inner reductions. I consciously
> chose
> > >>>>> not to
> > >>>>>>> go further than that before presenting it to the group for wider
> debate
> > >>>>>>> (indeed, the very debate we're having in this thread).
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>>
> > >>>>>>> Ebtree is not single writer, at least not inherently. Two
> updates to
> > >>>>> the
> > >>>>>>> same ebtree should both succeed as long as they don't modify the
> same
> > >>>>> nodes.
> > >>>>>>>
> > >>>>>>> Modifying the same node is likely where there's a reduce
> function,
> > >>>>> though
> > >>>>>>> only if the reduction value actually changes, as that percolates
> up the
> > >>>>>>> tree. Where the reduce value does not change and when neither
> > >>>>> transaction
> > >>>>>>> causes a split, rebalance, or merge that affects the other, they
> should
> > >>>>>>> both commit without conflict. There is a rich history of
> optimizations
> > >>>>> in
> > >>>>>>> this space specifically around btrees. ebtree might cause
> spurious
> > >>>>>>> conflicts today, I will investigate and propose fixes if so
> (e.g, I
> > >>>>> think I
> > >>>>>>> probably call erlfdb:set on nodes that have not changed).
> > >>>>>>>
> > >>>>>>> I certainly envisaged updating ebtree within the same txn as a
> doc
> > >>>>> update,
> > >>>>>>> which is why the first argument to all the public functions can
> be
> > >>>>> either
> > >>>>>>> an erlfdb Db or open Tx.
> > >>>>>>>
> > >>>>>>> Parallelising the initial build is more difficult with ebtree
> than
> > >>>>>>> parallelising the existing couch_views map-only code, though
> it's worth
> > >>>>>>> noting that ebtree:insert benefits a great deal from batching
> (multiple
> > >>>>>>> calls to :insert in the same transaction). For an offline build
> (i.e,
> > >>>>> an
> > >>>>>>> index that the client cannot see until the entire build is
> complete)
> > >>>>> the
> > >>>>>>> batch size can be maximised. That is still a serial process in
> that
> > >>>>> there
> > >>>>>>> is only one transaction at a time updating the ebtree. I can't
> say
> > >>>>> offhand
> > >>>>>>> how fast that is in practice, but it is clearly less powerful
> than a
> > >>>>> fully
> > >>>>>>> parallelisable approach could be.
> > >>>>>>>
> > >>>>>>> Any parallel build would require a way to divide the database
> into
> > >>>>>>> non-overlapping subsets of emitted keys. This is easy and
> natural if
> > >>>>> the
> > >>>>>>> fdb key is the emitted key, which is the case for the couch_views
> > >>>>> map-only
> > >>>>>>> code. For ebtree it might be enough to simply grab a large chunk
> of
> > >>>>>>> documents, perform the map transform, and then issues multiple
> > >>>>> transactions
> > >>>>>>> on subsets of those.
> > >>>>>>>
> > >>>>>>> Another common technique for btrees is bulk loading (more or less
> > >>>>>>> literally constructing the btree nodes directly from the source,
> as
> > >>>>> long as
> > >>>>>>> you can sort it), which might be an option as well.
> > >>>>>>>
> > >>>>>>> Parallelising a build _with_ a reduce function seems hard
> however we do
> > >>>>>>> it. The non-ebtree approach is parallelisable by virtue of
> paring down
> > >>>>> the
> > >>>>>>> reduce functionality itself (only whole key groups, only those
> > >>>>> functions
> > >>>>>>> that fdb has atomic operations for).
> > >>>>>>>
> > >>>>>>> I will first of all verify the multi-writer nature of ebtree as
> it
> > >>>>> stands
> > >>>>>>> today and make a PR which fixes any spurious conflicts, and then
> ponder
> > >>>>>>> further  how true parallel builds might be possible.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> On 24 Jul 2020, at 07:30, Garren Smith <gar...@apache.org>
> wrote:
> > >>>>>>>>
> > >>>>>>>> We haven't spoken much about updates with ebtree.
> > >>>>>>>> From my understanding of ebtree it can only do single writer,
> is that
> > >>>>>>>> correct?. If that is true it means we would not be able to
> > >>>>> parallelize
> > >>>>>>> the
> > >>>>>>>> background building of views to speed up view builds.
> > >>>>>>>> The other thing that would mean is that we cannot use it for
> mango
> > >>>>> where
> > >>>>>>> we
> > >>>>>>>> update the view in the doc transaction.
> > >>>>>>>>
> > >>>>>>>> Cheers
> > >>>>>>>> Garren
> > >>>>>>>>
> > >>>>>>>> On Fri, Jul 24, 2020 at 2:35 AM Kyle Snavely <
> kjsnav...@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> When in doubt, throw us a build at Cloudant with ebtree maps
> and
> > >>>>> we'll
> > >>>>>>> see
> > >>>>>>>>> if it comes close to the crazy fast KV map queries.
> > >>>>>>>>>
> > >>>>>>>>> Kyle
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Jul 23, 2020, 2:17 PM Robert Samuel Newson <
> > >>>>> rnew...@apache.org>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> I (perhaps obviously) don't agree that I'm tying myself to old
> > >>>>> CouchDB
> > >>>>>>> or
> > >>>>>>>>>> failing to embrace FDB. FDB is a toolkit and does not, to my
> mind,
> > >>>>>>> force
> > >>>>>>>>> us
> > >>>>>>>>>> down any particular path.
> > >>>>>>>>>>
> > >>>>>>>>>> I haven't sat down to modify couch_views in the manner I've
> > >>>>> suggested
> > >>>>>>>>>> (where ebtree is used as designed; being fed the output of the
> > >>>>> emit()
> > >>>>>>>>> calls
> > >>>>>>>>>> and calculating reductions as it does so) but I think it's a
> > >>>>> worthwhile
> > >>>>>>>>>> exercise. I'd be surprised if performance of map-only
> traversals
> > >>>>> would
> > >>>>>>> be
> > >>>>>>>>>> disappointing but who knows? I also expect it would allow for
> > >>>>>>> significant
> > >>>>>>>>>> simplification of the code, which is one of the higher
> virtues.
> > >>>>>>>>>>
> > >>>>>>>>>> Adam, can you describe in a little more detail how you picture
> > >>>>> "b+tree
> > >>>>>>> is
> > >>>>>>>>>> only used for incremental aggregations,"? It's implied in your
> > >>>>> reply
> > >>>>>>> that
> > >>>>>>>>>> it would preserve the "interesting property" of keeping user
> data
> > >>>>> out
> > >>>>>>> of
> > >>>>>>>>>> FDB Keys (for casual readers: the new native database
> encryption,
> > >>>>>>> called
> > >>>>>>>>>> "aegis", only encrypts the FDB value. It can't encrypt the
> key as
> > >>>>> this
> > >>>>>>>>>> would change the order of keys, which the current code
> depends on).
> > >>>>>>> Did I
> > >>>>>>>>>> misread you?
> > >>>>>>>>>>
> > >>>>>>>>>> B.
> > >>>>>>>>>>
> > >>>>>>>>>>> On 23 Jul 2020, at 20:11, Adam Kocoloski <
> kocol...@apache.org>
> > >>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> OK thanks for the clarification. As I said I wasn’t all that
> > >>>>> confident
> > >>>>>>>>> I
> > >>>>>>>>>> understood the design :)
> > >>>>>>>>>>>
> > >>>>>>>>>>> I like the idea that the b+tree is only used for incremental
> > >>>>>>>>>> aggregations, rather than storing the entire materialized
> view,
> > >>>>> for the
> > >>>>>>>>>> same reasons that Garren stated.
> > >>>>>>>>>>>
> > >>>>>>>>>>> An index maintained entirely in ebtree has the interesting
> > >>>>> property
> > >>>>>>>>> that
> > >>>>>>>>>> it does not leak any user data into FDB Keys, which could be
> > >>>>> attractive
> > >>>>>>>>> for
> > >>>>>>>>>> security reasons.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Adam
> > >>>>>>>>>>>
> > >>>>>>>>>>>> On Jul 23, 2020, at 1:54 PM, Garren Smith <
> gar...@apache.org>
> > >>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Thu, Jul 23, 2020 at 6:55 PM Paul Davis <
> > >>>>>>>>> paul.joseph.da...@gmail.com
> > >>>>>>>>>>>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>>> I would like to keep ebtree to use just for the reduce
> index.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Could you expand on your reasoning here, Garren? I haven't
> done
> > >>>>> any
> > >>>>>>>>>>>>> experiments on my own to understand if I'm missing
> something
> > >>>>>>>>>>>>> important. My initial reaction is to not diverge too far
> from
> > >>>>> the
> > >>>>>>>>>>>>> previous shape of the implementation since we have a decent
> > >>>>> idea of
> > >>>>>>>>>>>>> how that behaves already but perhaps you've seen or
> measured
> > >>>>>>>>> something
> > >>>>>>>>>>>>> I'm not thinking of?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I think this must have been a misunderstanding on my part. I
> > >>>>> always
> > >>>>>>>>>> thought
> > >>>>>>>>>>>> of using ebtree to solve reduce and wasn't planning to use
> it
> > >>>>> for the
> > >>>>>>>>>> map
> > >>>>>>>>>>>> index.
> > >>>>>>>>>>>> I don't like the idea that we have ordered distributed
> key/value
> > >>>>>>> store
> > >>>>>>>>>> and
> > >>>>>>>>>>>> then implementing a b-tree on top of that for indexing.
> > >>>>> Especially
> > >>>>>>>>>> since we
> > >>>>>>>>>>>> know that the current map index is fast,
> > >>>>>>>>>>>> easy to use, and follows recommended practices in the FDB
> > >>>>> community
> > >>>>>>> on
> > >>>>>>>>>> how
> > >>>>>>>>>>>> to do indexing. ebtree makes sense for reduce and is a
> means to
> > >>>>> an
> > >>>>>>> end
> > >>>>>>>>>> to
> > >>>>>>>>>>>> give us CouchDB's reduce api, which is heavily reliant on a
> > >>>>> b-tree,
> > >>>>>>>>> with
> > >>>>>>>>>>>> CouchDB on FDB. This feels like a step backward and I worry
> we
> > >>>>> are
> > >>>>>>>>> tying
> > >>>>>>>>>>>> ourselves heavily to old CouchDB instead of using the fact
> that
> > >>>>> we
> > >>>>>>>>>> moving
> > >>>>>>>>>>>> to FDB which then allows us to design new api's and
> > >>>>> functionality.
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >
> >
> >
>

Re: [DISCUSS] Reduce on FDB take 3

Reply via email to