[ https://issues.apache.org/jira/browse/LUCENE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503281#comment-13503281 ]
Mikhail Khludnev commented on LUCENE-4571: ------------------------------------------ It was a bad idea to reply to jira's mail. moving dialogue here: [~mkhludnev] {quote} Robert, am I right that establishing the perf test is the first necessary step, rather than the implementation itself. Also, (don't really important but let me mention) what I'm really looking for is the disjunction query with an user supplied verification strategy, where minShouldMatch is just one of the way to verify match. {quote} [~rcmuir] {quote} Right, the best way to do this is to extend luceneutil (http://code.google.com/a/apache-extras.org/p/luceneutil) to test this case. Keep in mind that I'd also be interested to see how BooleanScorer compares to BooleanScorer2 for this situation. I already mentioned on the solr list (nobody replied) that solr *never* gets BooleanScorer, but from time to time I hear solr users complaining about BooleanScorer2's performance for min-should-match So when trying to improve the performance of min-should-match, I think a very early step should be to see if we already have a better performing alternative that is just not being used: if thats the case then the best solution is to fix Solr's collectors to be able to cope with BooleanScorer. Intuitively I think its going to be like everything else, BS1 is better in some situations, BS2 in others. >>> Also, (don't really important but let me mention) what I'm really looking >>> for is the disjunction query with an user supplied verification strategy, >>> where minShouldMatch is just one of the way to verify match. I don't think our concrete scorers should have such a hook: they should be as dead simple as possible. If you want to do this, I recommend just extending the abstract DisjunctionScorer (Currently DisjunctionSum and DisjunctionMax extend this, as I suggested we should think about splitting out a MinShouldMatchScorer as well: its confusing that pure disjunctions are all mixed up with min-should-match and the algorithms should actually work differently). {quote} > speedup disjunction with minShouldMatch > ---------------------------------------- > > Key: LUCENE-4571 > URL: https://issues.apache.org/jira/browse/LUCENE-4571 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search > Affects Versions: 4.1 > Reporter: Mikhail Khludnev > > even minShouldMatch is supplied to DisjunctionSumScorer it enumerates whole > disjunction, and verifies minShouldMatch condition [on every > doc|https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/DisjunctionSumScorer.java#L70]: > {code} > public int nextDoc() throws IOException { > assert doc != NO_MORE_DOCS; > while(true) { > while (subScorers[0].docID() == doc) { > if (subScorers[0].nextDoc() != NO_MORE_DOCS) { > heapAdjust(0); > } else { > heapRemoveRoot(); > if (numScorers < minimumNrMatchers) { > return doc = NO_MORE_DOCS; > } > } > } > afterNext(); > if (nrMatchers >= minimumNrMatchers) { > break; > } > } > > return doc; > } > {code} > [~spo] proposes (as well as I get it) to pop nrMatchers-1 scorers from the > heap first, and then push them back advancing behind that top doc. For me the > question no.1 is there a performance test for minShouldMatch constrained > disjunction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org