jcoglan opened a new pull request #3912:
URL: https://github.com/apache/couchdb/pull/3912


   ## Overview
   
   This PR represents a prototype of a feature @janl and I have been working 
on. It lets us build indexes on dynamically computed values inside Mango, which 
would normally require writing a JS view. It does this by extending the syntax 
for indexes so that as well as the `asc` and `desc` sort directions, we allow 
an _expression_ to define a _virtual field_, for example:
   
   ```json
   "fields": [
     { "foo_words": { "$jq": ".foo | split(\" \") | .[]" } }
   ]
   ```
   
   This definition means that the virtual field `foo_words` is generated by 
splitting the `foo` property on spaces and emitting each result. So if we have 
a document like:
   
   ```json
   {
     "foo": "a b c"
   }
   ```
   
   then this index lets us find that doc using the `_find` query `"foo_words": 
"b"`.
   
   This is a prototype we're presenting to see if the functionality is of 
interest, before we commit any more work to making it more production-ready. 
Our reasoning for using `jq` for this is:
   
   - It's a ready-built expression language, we don't need to build a lot of 
the same functionality ourselves
   - It addresses design issues we faced trying to come up with our own 
function definition syntax, e.g.:
     - How do we indicate that a function input is taken from doc property vs a 
literal value
     - How do we indicate that we want to use an array result as an index key 
vs using each member of the array as a key
     - How do we support composition of different functions to produce a result
     - jq has nice answers to these questions already
   - CouchDB users are likely to be familiar with jq so it's one less thing to 
learn, and they can experiment with it in their shell while designing their 
indexes
   - It's very concise, compare `.foo | split(" ") | .[]` to our `{ "$explode": 
{ "$field": "foo", "$separator": " " } }`, which doesn't address the array vs 
elements problem
   
   That said, there is risk with adopting a native dependency and we fully 
understand if that's not a path others think we should go down. We're opening 
this to gauge interest in the idea of indexing on dynamic functions inside 
Mango, rather than whether we use jq specifically.
   
   ## Testing recommendations
   
   The Python test script included in the PR indicates how to use the 
functionality. You may need to augment the rebar script to add build flags for 
your environment; this was developed on macOS with `jq` installed via Homebrew.
   
   If we developed this further for production, we would want to add 
comprehensive unit tests for the `couch_jq` module to make sure it round-trips 
all JSON values correctly (I have verified this by hand but not written 
automated tests as such).
   
   If we decide to stick with jq then we should also fuzz-test the native code, 
and we should decide whether to vendor the `jq` codebase or compile against the 
system copy.
   
   There are also some warts in the implementation such as the addition of the 
virtual field into results based on the selector, which we'd need to come up 
with a cleaner solution for.
   
   ## Checklist
   
   - [ ] Code is written and works correctly
   - [ ] Changes are covered by tests
   - [ ] Any new configurable parameters are documented in 
`rel/overlay/etc/default.ini`
   - [ ] A PR for documentation changes has been made in 
https://github.com/apache/couchdb-documentation
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to