jcoglan opened a new pull request #3912: URL: https://github.com/apache/couchdb/pull/3912
## Overview This PR represents a prototype of a feature @janl and I have been working on. It lets us build indexes on dynamically computed values inside Mango, which would normally require writing a JS view. It does this by extending the syntax for indexes so that as well as the `asc` and `desc` sort directions, we allow an _expression_ to define a _virtual field_, for example: ```json "fields": [ { "foo_words": { "$jq": ".foo | split(\" \") | .[]" } } ] ``` This definition means that the virtual field `foo_words` is generated by splitting the `foo` property on spaces and emitting each result. So if we have a document like: ```json { "foo": "a b c" } ``` then this index lets us find that doc using the `_find` query `"foo_words": "b"`. This is a prototype we're presenting to see if the functionality is of interest, before we commit any more work to making it more production-ready. Our reasoning for using `jq` for this is: - It's a ready-built expression language, we don't need to build a lot of the same functionality ourselves - It addresses design issues we faced trying to come up with our own function definition syntax, e.g.: - How do we indicate that a function input is taken from doc property vs a literal value - How do we indicate that we want to use an array result as an index key vs using each member of the array as a key - How do we support composition of different functions to produce a result - jq has nice answers to these questions already - CouchDB users are likely to be familiar with jq so it's one less thing to learn, and they can experiment with it in their shell while designing their indexes - It's very concise, compare `.foo | split(" ") | .[]` to our `{ "$explode": { "$field": "foo", "$separator": " " } }`, which doesn't address the array vs elements problem That said, there is risk with adopting a native dependency and we fully understand if that's not a path others think we should go down. We're opening this to gauge interest in the idea of indexing on dynamic functions inside Mango, rather than whether we use jq specifically. ## Testing recommendations The Python test script included in the PR indicates how to use the functionality. You may need to augment the rebar script to add build flags for your environment; this was developed on macOS with `jq` installed via Homebrew. If we developed this further for production, we would want to add comprehensive unit tests for the `couch_jq` module to make sure it round-trips all JSON values correctly (I have verified this by hand but not written automated tests as such). If we decide to stick with jq then we should also fuzz-test the native code, and we should decide whether to vendor the `jq` codebase or compile against the system copy. There are also some warts in the implementation such as the addition of the virtual field into results based on the selector, which we'd need to come up with a cleaner solution for. ## Checklist - [ ] Code is written and works correctly - [ ] Changes are covered by tests - [ ] Any new configurable parameters are documented in `rel/overlay/etc/default.ini` - [ ] A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org