Re: [DISCUSS] Rebase CouchDB on top of FoundationDB

nicholas a. evans Sat, 26 Jan 2019 23:04:23 -0800

This seems like an exciting development. Could it build on the recent
pluggable storage engine work?


Speaking as a user, losing efficient reduce results queries would be a
*very* big deal for me. But I'm naively guessing a workaround might be
to create a tree just for reductions, updated in the same transaction
as the map emits? This could be opt-in (or opt-out), only if you
actually needed it. E.g. perhaps use tuples => values analogous to:

("db", "_design/foo", "_view/bar", "_btree_reduce") => {"val": ...,
childkeys: ["a", "f", "k", "p", "u", "z"]}
("db", "_design/foo", "_view/bar", "_btree_reduce", 0)  => {"val":
..., childkeys: ["a", "b", "c", "d", "e"]}
("db", "_design/foo", "_view/bar", "_btree_reduce", 0, 0) => {"val":
..., childkeys: ["a", "aa", "ab", "ac", "ad", "ae"]}
("db", "_design/foo", "_view/bar", "_btree_reduce", 0, 1) => {"val":
..., childkeys: ["b", "ba", "bb", "bc", "bd", "be"]}
etc...

Since 90% of my reduce queries use group=exact or group_level (some
views are *always* queried with group=exact), it might also make sense
to (opt-in) store reductions per view key at every group_level (or
only for group=exact). Generation could piggyback on the btree
reductions (i.e. when there are tens of thousands of emits per key).
For a little extra disk space, this would allow *never* running reduce
fns during update={false,lazy} queries (only during updates).

("db", "_design/foo", "_view/bar", "_group_reduce", "a") => {"val": ...}
("db", "_design/foo", "_view/bar", "_group_reduce", "a", 0) => {"val": ...}
("db", "_design/foo", "_view/bar", "_group_reduce", "a", 0, "foo") =>
{"val": ...}
etc...

It seems to me, a similar approach could maybe be used to (finally!?)
create an opt-in view changes feed? I would love a view changes feed
that enabled a quick and easy "chained" map reduce pipeline, or
"replicating" group-reduced results into a new DB (like Cloudant's
deprecated dbcopy feature), or something like RethinkDB's streaming
results.

("db", "_design/foo", "_view/bar", "_seq", 1) => {"add": ["key1",
"key2", "key3"]}
("db", "_design/foo", "_view/bar", "_seq", 2) => {"add": ["key4"],
"del": ["key2"]}
etc...

-- 
Nick Evans

On Sat, Jan 26, 2019 at 1:25 PM Robert Newson <rnew...@apache.org> wrote:
>
> Hi,
>
> It’s only arbitrary start/end key of the reduce values we have no solution 
> for yet . For map-only, we can and would supply arbitrary start/end key.
>
> More explicitly, it’s only _efficient_ reduce results that are lost, because 
> we can’t store the intermediate reduce values on inner btree nodes. We could 
> calculate the reduce value dynamically by reading the entire range of 
> selected keys and calculate the reduce value each time.
>
> Finally, this is a known gap in our investigation. It doesn’t mean there 
> isn’t an answer to be discovered.
>
> B
>
> > On 26 Jan 2019, at 17:20, Reddy B. <redd...@live.fr> wrote:
> >
> > Hello,
> >
> > Just to add the modest perspective of a user. I appreciate the benefits of 
> > taking advantage of the infrastructure provided by FDB both from a quality 
> > perspective and from a maintenance and ease of expansion perspective.
> >
> > However, this development makes me really worried of being burned as a user 
> > so to speak. Losing arbitrary reduce functions would be a big concern. But 
> > losing arbitrary startkey and endkey would be an even bigger concern.
> >
> > This is not about the inconvenience of updating our codebase, this is about 
> > losing the ability to do quite significant things, losing expressiveness so 
> > to speak. We make extensive use of startkeys/endkeys for things ranging 
> > from geoqueries using a simple application-based geohash implementation, 
> > all the way to matching the documents belonging to a tenant using complex 
> > keys. So if this is indeed the feature we'd be losing, this is quite a big 
> > deal in my opinion. I think all our data access layers would need to be 
> > rewritten but I do not even know how.
> >
> > For I do not know if we are representative, but we intentionally stay away 
> > from Mango to leverage the performance benefits of using precomputed views 
> > which see as a key feature of Couchdb. Mango is quite non-deterministic 
> > when it comes to performance (defining the right indexes is cumbersome 
> > compared to using views, and its difficult to know if a query will be 
> > completed by doing in-memory filtering). And people keep reporting a number 
> > of troubling bugs. So moving people to Mango is not only about updating 
> > applications, this is also losing quite substantial features.
> >
> > All in all, my point is that with that the changes I hear, I feel like a 
> > lot of the technical assumptions we made when we settled on Couchdb would 
> > no longer hold. There are patterns we wouldn't be able to use, and I don't 
> > even think that it would still be possible to develop real world 
> > applications solely relying of the view/reduce pipeline only if we do not 
> > have the level of expressiveness provided by custom reduce and arbitrary 
> > startkeys/endkeys. Without these two structures, we will burned in certain 
> > situations.
> >
> > Just wanted to voice this concern to highlight that there are folks like us 
> > for who the API of the view/reduce pipeline is central. So that hopefully 
> > this can be taken into account as the merits of this proposal are being 
> > reviewed.
> >
> > ________________________________
> > De : Dave Cottlehuber <d...@skunkwerks.at>
> > Envoyé : samedi 26 janvier 2019 15:31:24
> > À : dev@couchdb.apache.org
> > Objet : Re: [DISCUSS] Rebase CouchDB on top of FoundationDB
> >
> >> On Fri, 25 Jan 2019, at 09:58, Robert Samuel Newson wrote:
> >>
> >
> > Thanks for sharing this Bob, and also thanks everybody who shared their
> > thoughts too.
> >
> > I'm super excited, partly because we get to keep all our Couchy
> > goodness, and also that FDB brings some really interesting operational
> > capabilities to the table that normally you spend a decade trying to
> > build from scratch. The level of testing that has gone into FDB is
> > astounding[1].
> >
> > Things like seamless data migration, expanding storage and rebalancing
> > shards and nodes, as anybody who's dealt with large or long-lived
> > couchdb clusters knows are Hard Problems today.
> >
> > There's clearly a lot of work to be done -- it's early days -- and it
> > changes a lot of non-visible things like packaging, dependencies,
> > cross-platform support, and a markedly different operations model -- but
> > I'm most excited about the opportunities here at the storage layer for
> > us.
> >
> > Regarding handling larger k/v items than what fdb can handle, is covered
> > in the forums already[2] and is similar to how we'd query multiple docs
> > from a couchdb view today using an array-based complex/compound key:
> >
> > [0, ..] would give you all the docs in that view under key 0
> >
> > except that in FDB that query would happen for a single couchdb doc, and
> > returning a range query to achieve that. Similar to multiple docs, there
> > are some traps around managing that in an atomic fashion at the higher
> > layer.
> >
> > I'm sure there are many more things like this we'll need to wrap our
> > heads around!
> >
> > Especial thanks to the dual-hat-wearing IBM folk who have engaged with
> > the community so early in the process -- basically at the napkin
> > stage[3].
> >
> > [1]: https://www.youtube.com/watch?v=4fFDFbi3toc
> > [2]: 
> > https://forums.foundationdb.org/t/intent-roadmap-to-handle-larger-value-sizes/126
> > [3]: https://www.computerhistory.org/atchm/the-two-napkin-protocol/ the
> > famous napkin where BGP, the modern internet's backbone routing
> > protocol, was described.
> >
> > A+
> > Dave
>

Re: [DISCUSS] Rebase CouchDB on top of FoundationDB

Reply via email to