This seems like an exciting development. Could it build on the recent pluggable storage engine work?
Speaking as a user, losing efficient reduce results queries would be a *very* big deal for me. But I'm naively guessing a workaround might be to create a tree just for reductions, updated in the same transaction as the map emits? This could be opt-in (or opt-out), only if you actually needed it. E.g. perhaps use tuples => values analogous to: ("db", "_design/foo", "_view/bar", "_btree_reduce") => {"val": ..., childkeys: ["a", "f", "k", "p", "u", "z"]} ("db", "_design/foo", "_view/bar", "_btree_reduce", 0) => {"val": ..., childkeys: ["a", "b", "c", "d", "e"]} ("db", "_design/foo", "_view/bar", "_btree_reduce", 0, 0) => {"val": ..., childkeys: ["a", "aa", "ab", "ac", "ad", "ae"]} ("db", "_design/foo", "_view/bar", "_btree_reduce", 0, 1) => {"val": ..., childkeys: ["b", "ba", "bb", "bc", "bd", "be"]} etc... Since 90% of my reduce queries use group=exact or group_level (some views are *always* queried with group=exact), it might also make sense to (opt-in) store reductions per view key at every group_level (or only for group=exact). Generation could piggyback on the btree reductions (i.e. when there are tens of thousands of emits per key). For a little extra disk space, this would allow *never* running reduce fns during update={false,lazy} queries (only during updates). ("db", "_design/foo", "_view/bar", "_group_reduce", "a") => {"val": ...} ("db", "_design/foo", "_view/bar", "_group_reduce", "a", 0) => {"val": ...} ("db", "_design/foo", "_view/bar", "_group_reduce", "a", 0, "foo") => {"val": ...} etc... It seems to me, a similar approach could maybe be used to (finally!?) create an opt-in view changes feed? I would love a view changes feed that enabled a quick and easy "chained" map reduce pipeline, or "replicating" group-reduced results into a new DB (like Cloudant's deprecated dbcopy feature), or something like RethinkDB's streaming results. ("db", "_design/foo", "_view/bar", "_seq", 1) => {"add": ["key1", "key2", "key3"]} ("db", "_design/foo", "_view/bar", "_seq", 2) => {"add": ["key4"], "del": ["key2"]} etc... -- Nick Evans On Sat, Jan 26, 2019 at 1:25 PM Robert Newson <rnew...@apache.org> wrote: > > Hi, > > It’s only arbitrary start/end key of the reduce values we have no solution > for yet . For map-only, we can and would supply arbitrary start/end key. > > More explicitly, it’s only _efficient_ reduce results that are lost, because > we can’t store the intermediate reduce values on inner btree nodes. We could > calculate the reduce value dynamically by reading the entire range of > selected keys and calculate the reduce value each time. > > Finally, this is a known gap in our investigation. It doesn’t mean there > isn’t an answer to be discovered. > > B > > > On 26 Jan 2019, at 17:20, Reddy B. <redd...@live.fr> wrote: > > > > Hello, > > > > Just to add the modest perspective of a user. I appreciate the benefits of > > taking advantage of the infrastructure provided by FDB both from a quality > > perspective and from a maintenance and ease of expansion perspective. > > > > However, this development makes me really worried of being burned as a user > > so to speak. Losing arbitrary reduce functions would be a big concern. But > > losing arbitrary startkey and endkey would be an even bigger concern. > > > > This is not about the inconvenience of updating our codebase, this is about > > losing the ability to do quite significant things, losing expressiveness so > > to speak. We make extensive use of startkeys/endkeys for things ranging > > from geoqueries using a simple application-based geohash implementation, > > all the way to matching the documents belonging to a tenant using complex > > keys. So if this is indeed the feature we'd be losing, this is quite a big > > deal in my opinion. I think all our data access layers would need to be > > rewritten but I do not even know how. > > > > For I do not know if we are representative, but we intentionally stay away > > from Mango to leverage the performance benefits of using precomputed views > > which see as a key feature of Couchdb. Mango is quite non-deterministic > > when it comes to performance (defining the right indexes is cumbersome > > compared to using views, and its difficult to know if a query will be > > completed by doing in-memory filtering). And people keep reporting a number > > of troubling bugs. So moving people to Mango is not only about updating > > applications, this is also losing quite substantial features. > > > > All in all, my point is that with that the changes I hear, I feel like a > > lot of the technical assumptions we made when we settled on Couchdb would > > no longer hold. There are patterns we wouldn't be able to use, and I don't > > even think that it would still be possible to develop real world > > applications solely relying of the view/reduce pipeline only if we do not > > have the level of expressiveness provided by custom reduce and arbitrary > > startkeys/endkeys. Without these two structures, we will burned in certain > > situations. > > > > Just wanted to voice this concern to highlight that there are folks like us > > for who the API of the view/reduce pipeline is central. So that hopefully > > this can be taken into account as the merits of this proposal are being > > reviewed. > > > > ________________________________ > > De : Dave Cottlehuber <d...@skunkwerks.at> > > Envoyé : samedi 26 janvier 2019 15:31:24 > > À : dev@couchdb.apache.org > > Objet : Re: [DISCUSS] Rebase CouchDB on top of FoundationDB > > > >> On Fri, 25 Jan 2019, at 09:58, Robert Samuel Newson wrote: > >> > > > > Thanks for sharing this Bob, and also thanks everybody who shared their > > thoughts too. > > > > I'm super excited, partly because we get to keep all our Couchy > > goodness, and also that FDB brings some really interesting operational > > capabilities to the table that normally you spend a decade trying to > > build from scratch. The level of testing that has gone into FDB is > > astounding[1]. > > > > Things like seamless data migration, expanding storage and rebalancing > > shards and nodes, as anybody who's dealt with large or long-lived > > couchdb clusters knows are Hard Problems today. > > > > There's clearly a lot of work to be done -- it's early days -- and it > > changes a lot of non-visible things like packaging, dependencies, > > cross-platform support, and a markedly different operations model -- but > > I'm most excited about the opportunities here at the storage layer for > > us. > > > > Regarding handling larger k/v items than what fdb can handle, is covered > > in the forums already[2] and is similar to how we'd query multiple docs > > from a couchdb view today using an array-based complex/compound key: > > > > [0, ..] would give you all the docs in that view under key 0 > > > > except that in FDB that query would happen for a single couchdb doc, and > > returning a range query to achieve that. Similar to multiple docs, there > > are some traps around managing that in an atomic fashion at the higher > > layer. > > > > I'm sure there are many more things like this we'll need to wrap our > > heads around! > > > > Especial thanks to the dual-hat-wearing IBM folk who have engaged with > > the community so early in the process -- basically at the napkin > > stage[3]. > > > > [1]: https://www.youtube.com/watch?v=4fFDFbi3toc > > [2]: > > https://forums.foundationdb.org/t/intent-roadmap-to-handle-larger-value-sizes/126 > > [3]: https://www.computerhistory.org/atchm/the-two-napkin-protocol/ the > > famous napkin where BGP, the modern internet's backbone routing > > protocol, was described. > > > > A+ > > Dave >