Re: [DISCUSS] Reduce on FDB take 3

Robert Samuel Newson Fri, 24 Jul 2020 01:07:25 -0700

Hi,

A short preface at this stage is needed I think:

My goal with ebtree was to implement a complete and correct b+tree that also 
calculated and maintained inner reductions. I consciously chose not to go 
further than that before presenting it to the group for wider debate (indeed, 
the very debate we're having in this thread).

--

Ebtree is not single writer, at least not inherently. Two updates to the same 
ebtree should both succeed as long as they don't modify the same nodes.

Modifying the same node is likely where there's a reduce function, though only 
if the reduction value actually changes, as that percolates up the tree. Where 
the reduce value does not change and when neither transaction causes a split, 
rebalance, or merge that affects the other, they should both commit without 
conflict. There is a rich history of optimizations in this space specifically 
around btrees. ebtree might cause spurious conflicts today, I will investigate 
and propose fixes if so (e.g, I think I probably call erlfdb:set on nodes that 
have not changed).

I certainly envisaged updating ebtree within the same txn as a doc update, 
which is why the first argument to all the public functions can be either an 
erlfdb Db or open Tx.

Parallelising the initial build is more difficult with ebtree than 
parallelising the existing couch_views map-only code, though it's worth noting 
that ebtree:insert benefits a great deal from batching (multiple calls to 
:insert in the same transaction). For an offline build (i.e, an index that the 
client cannot see until the entire build is complete) the batch size can be 
maximised. That is still a serial process in that there is only one transaction 
at a time updating the ebtree. I can't say offhand how fast that is in 
practice, but it is clearly less powerful than a fully parallelisable approach 
could be.

Any parallel build would require a way to divide the database into 
non-overlapping subsets of emitted keys. This is easy and natural if the fdb 
key is the emitted key, which is the case for the couch_views map-only code. 
For ebtree it might be enough to simply grab a large chunk of documents, 
perform the map transform, and then issues multiple transactions on subsets of 
those.

Another common technique for btrees is bulk loading (more or less literally 
constructing the btree nodes directly from the source, as long as you can sort 
it), which might be an option as well.

Parallelising a build _with_ a reduce function seems hard however we do it. The 
non-ebtree approach is parallelisable by virtue of paring down the reduce 
functionality itself (only whole key groups, only those functions that fdb has 
atomic operations for).

I will first of all verify the multi-writer nature of ebtree as it stands today 
and make a PR which fixes any spurious conflicts, and then ponder further  how 
true parallel builds might be possible.

> On 24 Jul 2020, at 07:30, Garren Smith <gar...@apache.org> wrote:
> 
> We haven't spoken much about updates with ebtree.
> From my understanding of ebtree it can only do single writer, is that
> correct?. If that is true it means we would not be able to parallelize the
> background building of views to speed up view builds.
> The other thing that would mean is that we cannot use it for mango where we
> update the view in the doc transaction.
> 
> Cheers
> Garren
> 
> On Fri, Jul 24, 2020 at 2:35 AM Kyle Snavely <kjsnav...@gmail.com> wrote:
> 
>> When in doubt, throw us a build at Cloudant with ebtree maps and we'll see
>> if it comes close to the crazy fast KV map queries.
>> 
>> Kyle
>> 
>> On Thu, Jul 23, 2020, 2:17 PM Robert Samuel Newson <rnew...@apache.org>
>> wrote:
>> 
>>> I (perhaps obviously) don't agree that I'm tying myself to old CouchDB or
>>> failing to embrace FDB. FDB is a toolkit and does not, to my mind, force
>> us
>>> down any particular path.
>>> 
>>> I haven't sat down to modify couch_views in the manner I've suggested
>>> (where ebtree is used as designed; being fed the output of the emit()
>> calls
>>> and calculating reductions as it does so) but I think it's a worthwhile
>>> exercise. I'd be surprised if performance of map-only traversals would be
>>> disappointing but who knows? I also expect it would allow for significant
>>> simplification of the code, which is one of the higher virtues.
>>> 
>>> Adam, can you describe in a little more detail how you picture "b+tree is
>>> only used for incremental aggregations,"? It's implied in your reply that
>>> it would preserve the "interesting property" of keeping user data out of
>>> FDB Keys (for casual readers: the new native database encryption, called
>>> "aegis", only encrypts the FDB value. It can't encrypt the key as this
>>> would change the order of keys, which the current code depends on). Did I
>>> misread you?
>>> 
>>> B.
>>> 
>>>> On 23 Jul 2020, at 20:11, Adam Kocoloski <kocol...@apache.org> wrote:
>>>> 
>>>> OK thanks for the clarification. As I said I wasn’t all that confident
>> I
>>> understood the design :)
>>>> 
>>>> I like the idea that the b+tree is only used for incremental
>>> aggregations, rather than storing the entire materialized view, for the
>>> same reasons that Garren stated.
>>>> 
>>>> An index maintained entirely in ebtree has the interesting property
>> that
>>> it does not leak any user data into FDB Keys, which could be attractive
>> for
>>> security reasons.
>>>> 
>>>> Adam
>>>> 
>>>>> On Jul 23, 2020, at 1:54 PM, Garren Smith <gar...@apache.org> wrote:
>>>>> 
>>>>> On Thu, Jul 23, 2020 at 6:55 PM Paul Davis <
>> paul.joseph.da...@gmail.com
>>>> 
>>>>> wrote:
>>>>> 
>>>>>>> I would like to keep ebtree to use just for the reduce index.
>>>>>> 
>>>>>> Could you expand on your reasoning here, Garren? I haven't done any
>>>>>> experiments on my own to understand if I'm missing something
>>>>>> important. My initial reaction is to not diverge too far from the
>>>>>> previous shape of the implementation since we have a decent idea of
>>>>>> how that behaves already but perhaps you've seen or measured
>> something
>>>>>> I'm not thinking of?
>>>>>> 
>>>>> 
>>>>> I think this must have been a misunderstanding on my part. I always
>>> thought
>>>>> of using ebtree to solve reduce and wasn't planning to use it for the
>>> map
>>>>> index.
>>>>> I don't like the idea that we have ordered distributed key/value store
>>> and
>>>>> then implementing a b-tree on top of that for indexing. Especially
>>> since we
>>>>> know that the current map index is fast,
>>>>> easy to use, and follows recommended practices in the FDB community on
>>> how
>>>>> to do indexing. ebtree makes sense for reduce and is a means to an end
>>> to
>>>>> give us CouchDB's reduce api, which is heavily reliant on a b-tree,
>> with
>>>>> CouchDB on FDB. This feels like a step backward and I worry we are
>> tying
>>>>> ourselves heavily to old CouchDB instead of using the fact that we
>>> moving
>>>>> to FDB which then allows us to design new api's and functionality.
>>>> 
>>> 
>>> 
>>

Re: [DISCUSS] Reduce on FDB take 3

Reply via email to