romseygeek opened a new pull request #620: Automatically rewrite disjunctions 
when internal gaps matter
URL: https://github.com/apache/lucene-solr/pull/620
 
 
   We have a number of IntervalsSource implementations where automatic 
minimization of 
   disjunctions can lead to surprising results:
   * PHRASE queries can miss matches because a longer matching sub-source is 
minimized 
   away, leaving a gap
   * MAXGAPS queries can miss matches for the same reason
   * CONTAINING, NOT_CONTAINING, CONTAINED_BY and NOT_CONTAINED_BY queries 
      can miss matches if the 'big' interval gets minimized
   
   The proper way to deal with this is to rewrite the queries by pulling 
disjunctions to the top
   of the query tree, so that `PHRASE("a", OR(PHRASE("b", "c"), "c"))` is 
rewritten to
   `OR(PHRASE("a", "b", "c"), PHRASE("a", "c"))`.  To be able to do this 
generally, we need to
   add a new `pullUpDisjunctions()` method to `IntervalsSource` that performs 
this rewriting
   for each source that it would apply to.
   
   Because these rewritten queries will in general be less efficient due to the 
duplication of
   effort (eg the rewritten PHRASE query above pulls 5 term iterators rather 
than 4 in the
   original), we also add an option to `Intervals.or()` that will prevent this 
happening, so that
   consumers can choose speed over accuracy if it suits their usecase.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to