jcoglan opened a new pull request, #5858:
URL: https://github.com/apache/couchdb/pull/5858

   ## Overview
   
   This PR represents work so far on a version of `mango_selector:match/2` than 
can return a representation of how the input fails to match the selector, 
rather than just a boolean. There are a few commits of developing this 
behaviour incrementally before everything "snaps into place" to get us code 
paths that can produce failure descriptions, _and_ retain the original boolean 
behaviour that avoids creating a lot of ephemeral `failure` lists, and can 
short-circuit on compound operators.
   
   The rough steps here are as follows; I'm retaining all these commits for now 
in case we want to compare different designs for code complexity and 
performance.
   
   - Add a lot of unit tests to `mango_selector` that check every match 
operator and its negation. This is necessary since various compound operators 
like `$or` and `$allMatch` have surprising edge case behaviour on empty lists 
and we need to make sure this is not broken. Mostly these tests check that if 
selector `S` returns `true` on an input then `{ "$not": S }` returns `false` 
and vice versa. There are some exceptions to this due to how `$or` works and 
how `$and` and `$or` are normalised.
   - Replace the existing implementation with one where every operator returns 
a possibly-empty list of failures, instead of a boolean. `match_int` then 
converts this to a bool on the way out.
   - Replace the `Cmp` argument to everything with a `ctx` record that contains 
`cmp`, as well as other things needed for failure generation, e.g. the path to 
the current value, whether matching is currently negated, etc.
   - Try to fix `$allMatch` and `$elemMatch` normalisation to avoid the 
complexity that comes from not being able to apply DeMorgan to them, due to 
`$allMatch` being defined to return false for empty lists. This is a breaking 
change that is reverted later since we decided to retain existing behaviour 
above all.
   - Implement negation handling, where the presence of a `$not` has to be 
communicated to nodes lower down the tree in order to produce good failure 
messages. Not all negation can be normalised out of the tree and so all 
operators need to handle being negated during evaluation.
   - Finally, collapse all the complexity into an implementation that supports 
_both_ the old and new behaviours.
   
   The idea in the design I've ended up with that tries to minimise both 
complexity and runtime cost is:
   
   - Add `#ctx.verbose` which indicates whether a detailed failure description 
is wanted.
   - Keep the original implementations of all operators as the response when 
passed `#ctx{verbose=false}`, i.e. when only a boolean result is needed.
   - For all leaf operators, the `#ctx{verbose=true}` case can be implemented 
by calling the `#ctx{verbose=false}` case, and creating a `#failure` record if 
this returns false.
   - For compound operators, special code is needed to gather up the failures 
from internal selectors and deal with edge cases in a way that's consistent 
with the original implementation.
   - `#ctx.path` is only updated in `#ctx{verbose=true}` code paths so this 
expense is avoided in non-verbose mode. Path items are added to the front of 
this list as that's cheaper than doing `Path ++ [Item]`; we would reverse these 
before returning to a client.
   - `#failure` records retain a `#ctx` from where they can access the path and 
negation state, in order to generate a good human-readable error message later 
on.
   - The tests are updated to make sure that both `verbose` modes return 
consistent results, i.e. if `verbose=false` returns `true`, then `verbose=true` 
returns `[]`, and if the former returns `false`, the latter gives a non-empty 
list. These are all passing.
   
   ## Testing recommendations
   
   We should benchmark this in its current version, and both `verbose` modes of 
this version, against a substantial indexing workload to look for performance 
regressions. Or, if performance is equivalent in both `verbose` modes, we can 
remove a lot of redundancy by removing the `verbose` flag entirely.
   
   ## Related Issues or Pull Requests
   
   - https://github.com/apache/couchdb/pull/5792
   - 
   ## Checklist
   
   - [ ] Code is written and works correctly
   - [ ] Changes are covered by tests
   - [ ] Any new configurable parameters are documented in 
`rel/overlay/etc/default.ini`
   - [ ] Documentation changes were made in the `src/docs` folder
   - [ ] Documentation changes were backported (separated PR) to affected 
branches
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to