On 13/10/2020 11:48, Robert Samuel Newson wrote:
Hi All,
As part of CouchDB 4.0, which moves the storage tier of CouchDB into
FoundationDB, we have struggled to reproduce the full map/reduce functionality.
Happily this has now happened, and that work is now merged to the couchdb main
branch.
\o/
This functionality includes the use of custom (javascript) reduce functions. It
is my experience that these are very often problematic, in that much more often
than not the functions do not significantly reduce the input parameters into a
smaller result (indeed, sometimes the output is the same or larger than the
input).
Agreed, it is very rare that I find a well-written custom reduce
function. It happens, though, and the people who write them are also
advanced or expert CouchDB users. They would know how to toggle the default.
To that end, I'm asking if we should deprecate the feature entirely.
and, from the reply to Jonathan:
I also think if custom reduce was disabled by default that we would be
motivated to expand this set of built-in reduce functions.
If deprecation means eventual removal, we need to take additional steps.
What would help inform this decision would be a survey of the community
for custom reduce functions. If this can then inform writing more
built-in _reduces that we ship in various 4.x releases, and remove the
feature in 5.0, that could work.
There needs to be a concerted effort to reach out to users and
understand these use cases, followed by a similar effort to write
replacements and have the community vet them. To date we've only added
two new built-in enhancements I can remember, and that's the HyperLogLog
stuff, plus the ability to do _sum / _count / _stats on lists and
objects (which was a Cloudant donation about 6 years ago, IIRC).
Here's some examples of custom reduces I've seen recently that could not
be satisfied by our current built-ins:
* wallet/balance calculation, based on transactional data
* _stats like functionality, but derived from complex documents that
have lists of objects that must be iterated over
* advanced statistical calculation: ANOVA, t-test, linear regression,
bayesian, etc.
None of these are unsolveable, but they will require effort. I'm ready
to help talk to users if this is the direction we want to go, but I want
to see a firm commitment by other developers to help implement new
built-in reduces brought to the table before +1'ing this decision.
Companies like IBM/Cloudant and Neighbourhoodie have special access
here, and would be key players in helping get this work done.
Let's contrast this with a famous deprecation that didn't go as well:
list/show/rewrites removal. Most of us agree that this functionality is
much better served by parallel servers that have a huge plethora of
functionality available to them, plus a wide base of support outside of
our own ecosystem. Critically, these functions are purely
transformative: none store new data into the database. I'm don't think a
similar approach makes sense for custom reduce, since those results
*are* pre-calculated and stored.
One more contrast. Two years ago, I wrote up a spec to introduce VDU and
update handler functionality into Mango[1]. Here's a situation where
there was broad user acceptance, and general agreement on the direction
to move forward. We could arguably deprecate our current approach for
these once this functionality has built. The problem has been finding
someone willing to develop it -- I don't have the time.
Looking forward to others' thoughts.
-Joan "developers, developers, developers" Touzet
[1]: https://github.com/apache/couchdb/issues/1554
In scope for this thread is the middle ground proposal that Paul Davis has
written up here;
https://github.com/apache/couchdb/pull/3214
Where custom reduces are not allowed by default but can be enabled.
The core _ability_ to do custom reduces will always been maintained, this is
intrinsic to the design of ebtree, the structure we use on top of FoundationDB
to hold and maintain intermediate reduce values.
My view is that we should merge #3214 and disable custom reduces by default.
B.