[ 
https://issues.apache.org/jira/browse/LUCENE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503281#comment-13503281
 ] 

Mikhail Khludnev commented on LUCENE-4571:
------------------------------------------

It was a bad idea to reply to jira's mail. moving dialogue here:

[~mkhludnev]
{quote}
Robert, am I right that establishing the perf test is the first necessary step, 
rather than the implementation itself.
Also, (don't really important but let me mention) what I'm really looking for 
is the disjunction query with an user supplied verification strategy, where 
minShouldMatch is just one of the way to verify match.
{quote}

[~rcmuir]
{quote}
Right, the best way to do this is to extend luceneutil 
(http://code.google.com/a/apache-extras.org/p/luceneutil) to test this case.

Keep in mind that I'd also be interested to see how BooleanScorer compares to 
BooleanScorer2 for this situation. I already mentioned on the solr list (nobody 
replied) that solr *never* gets BooleanScorer, but from time to time I hear 
solr users complaining about BooleanScorer2's performance for min-should-match

So when trying to improve the performance of min-should-match, I think a very 
early step should be to see if we already have a better performing alternative 
that is just not being used: if thats the case then the best solution is to fix 
Solr's collectors to be able to cope with BooleanScorer.

Intuitively I think its going to be like everything else, BS1 is better in some 
situations, BS2 in others.

>>> Also, (don't really important but let me mention) what I'm really looking 
>>> for is the disjunction query with an user supplied verification strategy, 
>>> where minShouldMatch is just one of the way to verify match.

I don't think our concrete scorers should have such a hook: they should be as 
dead simple as possible.

If you want to do this, I recommend just extending the abstract 
DisjunctionScorer (Currently DisjunctionSum and DisjunctionMax extend this, as 
I suggested we should think about splitting out a MinShouldMatchScorer as well: 
its confusing that pure disjunctions are all mixed up with min-should-match and 
the algorithms should actually work differently).
{quote}


                
> speedup disjunction with minShouldMatch 
> ----------------------------------------
>
>                 Key: LUCENE-4571
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4571
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.1
>            Reporter: Mikhail Khludnev
>
> even minShouldMatch is supplied to DisjunctionSumScorer it enumerates whole 
> disjunction, and verifies minShouldMatch condition [on every 
> doc|https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/DisjunctionSumScorer.java#L70]:
> {code}
>   public int nextDoc() throws IOException {
>     assert doc != NO_MORE_DOCS;
>     while(true) {
>       while (subScorers[0].docID() == doc) {
>         if (subScorers[0].nextDoc() != NO_MORE_DOCS) {
>           heapAdjust(0);
>         } else {
>           heapRemoveRoot();
>           if (numScorers < minimumNrMatchers) {
>             return doc = NO_MORE_DOCS;
>           }
>         }
>       }
>       afterNext();
>       if (nrMatchers >= minimumNrMatchers) {
>         break;
>       }
>     }
>     
>     return doc;
>   }
> {code}
> [~spo] proposes (as well as I get it) to pop nrMatchers-1 scorers from the 
> heap first, and then push them back advancing behind that top doc. For me the 
> question no.1 is there a performance test for minShouldMatch constrained 
> disjunction. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to