jcoglan opened a new pull request, #5858:
URL: https://github.com/apache/couchdb/pull/5858
## Overview
This PR represents work so far on a version of `mango_selector:match/2` than
can return a representation of how the input fails to match the selector,
rather than just a boolean. There are a few commits of developing this
behaviour incrementally before everything "snaps into place" to get us code
paths that can produce failure descriptions, _and_ retain the original boolean
behaviour that avoids creating a lot of ephemeral `failure` lists, and can
short-circuit on compound operators.
The rough steps here are as follows; I'm retaining all these commits for now
in case we want to compare different designs for code complexity and
performance.
- Add a lot of unit tests to `mango_selector` that check every match
operator and its negation. This is necessary since various compound operators
like `$or` and `$allMatch` have surprising edge case behaviour on empty lists
and we need to make sure this is not broken. Mostly these tests check that if
selector `S` returns `true` on an input then `{ "$not": S }` returns `false`
and vice versa. There are some exceptions to this due to how `$or` works and
how `$and` and `$or` are normalised.
- Replace the existing implementation with one where every operator returns
a possibly-empty list of failures, instead of a boolean. `match_int` then
converts this to a bool on the way out.
- Replace the `Cmp` argument to everything with a `ctx` record that contains
`cmp`, as well as other things needed for failure generation, e.g. the path to
the current value, whether matching is currently negated, etc.
- Try to fix `$allMatch` and `$elemMatch` normalisation to avoid the
complexity that comes from not being able to apply DeMorgan to them, due to
`$allMatch` being defined to return false for empty lists. This is a breaking
change that is reverted later since we decided to retain existing behaviour
above all.
- Implement negation handling, where the presence of a `$not` has to be
communicated to nodes lower down the tree in order to produce good failure
messages. Not all negation can be normalised out of the tree and so all
operators need to handle being negated during evaluation.
- Finally, collapse all the complexity into an implementation that supports
_both_ the old and new behaviours.
The idea in the design I've ended up with that tries to minimise both
complexity and runtime cost is:
- Add `#ctx.verbose` which indicates whether a detailed failure description
is wanted.
- Keep the original implementations of all operators as the response when
passed `#ctx{verbose=false}`, i.e. when only a boolean result is needed.
- For all leaf operators, the `#ctx{verbose=true}` case can be implemented
by calling the `#ctx{verbose=false}` case, and creating a `#failure` record if
this returns false.
- For compound operators, special code is needed to gather up the failures
from internal selectors and deal with edge cases in a way that's consistent
with the original implementation.
- `#ctx.path` is only updated in `#ctx{verbose=true}` code paths so this
expense is avoided in non-verbose mode. Path items are added to the front of
this list as that's cheaper than doing `Path ++ [Item]`; we would reverse these
before returning to a client.
- `#failure` records retain a `#ctx` from where they can access the path and
negation state, in order to generate a good human-readable error message later
on.
- The tests are updated to make sure that both `verbose` modes return
consistent results, i.e. if `verbose=false` returns `true`, then `verbose=true`
returns `[]`, and if the former returns `false`, the latter gives a non-empty
list. These are all passing.
## Testing recommendations
We should benchmark this in its current version, and both `verbose` modes of
this version, against a substantial indexing workload to look for performance
regressions. Or, if performance is equivalent in both `verbose` modes, we can
remove a lot of redundancy by removing the `verbose` flag entirely.
## Related Issues or Pull Requests
- https://github.com/apache/couchdb/pull/5792
-
## Checklist
- [ ] Code is written and works correctly
- [ ] Changes are covered by tests
- [ ] Any new configurable parameters are documented in
`rel/overlay/etc/default.ini`
- [ ] Documentation changes were made in the `src/docs` folder
- [ ] Documentation changes were backported (separated PR) to affected
branches
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]