Hi all, I’m writing to you about a new proposed feature for CouchDB: declarative validate doc updates: a way to do document validations without writing JavaScript code that needs to be evaluated on each doc update, which in practice means we recommend not using this for all but the smallest traffic situations.
We have a general desire to have this as a feature since at least 2018[1] (thanks Diana) and we have recently received (thank you Sovereign Tech Agency & James Coglan) an RFC that covers in-depth what a complete solution would look like[2]. [1]: https://github.com/apache/couchdb/issues/1554 [2]: https://github.com/apache/couchdb/pull/5792 We are now looking for community feedback on the plan as outlined below because we can only anticipate end-users’ needs for this feature. Please respond here, or if simpler on the pull request referenced above. The RFC is concerned with two aspects: 1. Should we extend Mango to support VDU duties, or should we adopt JSON Schema, as it is a larger standard around JSON validation? 2. What would an implementation look like that could express the same validations as current JS VDUs and should we limit the scope to ease introducing this feature? The RFC analyses point 1. in-depth and strongly recommends going with the option of extending Mango rather than adopting JSONSchema, as it is not quite suited to the way CouchDB needs a validation library to work. As a consequence the RFC outlines a list of additions to Mango so it can be used as a full-replacement for JS VDUs. Allowing the expression of general purpose programming languages in a declarative way naturally comes with a set of complexities, which makes the RFC a very long document to read and subsequently makes the proposal hard to discuss. After initial developer feedback, we came up with a phased approach to implementing this RFC. This allows us to make quick progress on things we all agree on and to separate open discussion points to a specific stage as not to block progress on other stages. Of course we want to also make sure to not make decisions in earlier stages that block options in later stages. Overall concerns for this whole endeavour are: 1. The resulting additional complexity of using Mango. Users should have an easy time picking up the new additions to Mango to express their validation needs. Ideally, any additions we make are also useful or at least be not confusing in the indexing context. 2. The resulting additional complexity of Mango’s implementation. It should not turn Mango into a maintenance burden. - This includes performance concerns during indexing. 3. How far do we need to go on the path of allowing Mango VDUs to be as expressive as JS VDUs. Is a 80/20 solution good enough where folks that need more flexibility can always use JS or Erlang (if performance is a concern)? As such, the plan as it stands for now is as follows: - Phase 1: Allow Mango as it exists today to be used as a VDU. This already exists and is a surprisingly small patch[3]. Thanks past us’s. There is broad dev consensus that this is a useful addition on its own and could be used as a test-balloon in the next release to gather wider community feedback. In this phase, a Mango selector evaluates to a boolean that in the indexing case decides whether a doc should be indexed or not. Phase 1 Mango VDUs behave the same way. If a doc does not match the selector, its update is rejected with a `{forbidden: “Document is not valid”}` response. [3]: https://github.com/apache/couchdb/pull/5839 - Phase 2: The RFC recommends increasing the usefulness of the error response: instead of rejecting the document update with a generic response, it suggests to return a list of all validation failures, so a human seeing the results can fix them in one go rather then one by one with multiple document update roundtrips (which adds server load and increases the possibility of 409 update conflicts). There is currently no consensus on this feature. The reasons are the following: 1. An implementation of this necessarily requires making the evaluation logic of Mango more complicated. At a minimum, it requires the evaluation of all clauses of a selector, whereas currently, as soon as a clause doesn’t come out as `true`, the evaluation can be stopped, leading to a performance optimisation during indexing. 2. It requires the tracking of errors in a list to return to the caller later. IF we wanted to make it so that the indexer would still get the shortcut behaviour, we’d need an additional boolean option that switches between the two behaviours. While we believe we could make a neat version of this, it does not exist yet and it will be more complex than what we have now. Another aspect of Phase 2 is a conditional construct ($if/$then/$else in the RFC), if folks don’t like the `$if` terminology, I’m happy to temporarily bikeshed this to `$match/$true/$false`. There is wide consensus that this is a useful addition to Mango regardless of the VDU work (think un-uniform docs that get normalised during indexing). Please send your bikeshedding votes for the exact operators names you prefer :) 3. If we do not keep the shortcut behaviour, a Mango indexer will have to allocate more terms only to throw them away later. This is not a tidy implementation and might even affect indexing performance if only at large scale. This needs to be traded-off against feature usefulness and code complexity. - Phase 3 authentication. We have a consensus currently that nothing in the above will preclude us from adding authn to Mango VDUs, but we’re punting work on this for the moment to get the rest of the implementation solid. This is not a rejection of this part of the RFC, it’s just a deferral. Coincidentally, the most complex addition to Mango (the $data operator that lets you reference the values in arbitrary fields in your input set) was mainly added to the RFC to support authn, so it is very convenient that we can skip this for now. - Phase 4 optional additions. This is a loose collection of additions to Mango that make all of the above more useful, but are in themselves not required to provide the base functionality of Mango VDUs, even if that means that the VDU selectors are not as expressive as a corresponding JS function. We can do any of this at any time, so this isn’t really a phase, this is just to collect bits that aren’t required for any of the other phases. These optional additions are: 1. String manipulation functions (e.g. `$concat`). These are not controversial, but we are not sure which operations would be required. This is a great place to submit feedback. 2. Customisable error reporting (to mirror the current flexible `throw()` option in JS VDUs) so developers can set their own error messages. There is consensus that this is a very optional feature that we should not worry about for now unless users request it. It will be an easy addition, but could add a level of complexity that would slow down adoption. Please let us know what you think. 3. A `$ref` operator that acts like an include mechanism, so a set of base selectors can be combined without duplicating the individual selector logic (akin to the CommonJS module system we have now). There is no consensus on this feature. Usefulness and language complexity need to be weighed against each other. This is another great place to leave feedback. — I’ll stop here with a call to action: If you’re a CouchDB user, please respond here or in [2[ with what you think about this feature. Thanks for reading and looking forward to hear from you :) Best Jan —
