romseygeek opened a new pull request #620: Automatically rewrite disjunctions when internal gaps matter URL: https://github.com/apache/lucene-solr/pull/620 We have a number of IntervalsSource implementations where automatic minimization of disjunctions can lead to surprising results: * PHRASE queries can miss matches because a longer matching sub-source is minimized away, leaving a gap * MAXGAPS queries can miss matches for the same reason * CONTAINING, NOT_CONTAINING, CONTAINED_BY and NOT_CONTAINED_BY queries can miss matches if the 'big' interval gets minimized The proper way to deal with this is to rewrite the queries by pulling disjunctions to the top of the query tree, so that `PHRASE("a", OR(PHRASE("b", "c"), "c"))` is rewritten to `OR(PHRASE("a", "b", "c"), PHRASE("a", "c"))`. To be able to do this generally, we need to add a new `pullUpDisjunctions()` method to `IntervalsSource` that performs this rewriting for each source that it would apply to. Because these rewritten queries will in general be less efficient due to the duplication of effort (eg the rewritten PHRASE query above pulls 5 term iterators rather than 4 in the original), we also add an option to `Intervals.or()` that will prevent this happening, so that consumers can choose speed over accuracy if it suits their usecase.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
