[ https://issues.apache.org/jira/browse/LUCENE-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847797#action_12847797 ]
Gary Yngve commented on LUCENE-2336: ------------------------------------ I don't have a specific use case. Long story short, I am toying with a BM25 implementation via subclassing a slightly refactored DisjunctionSumScorer, as the published BM25 research implementation doesn't work w/ phrases and is too smelly for my tastes. I'm attempting to understand how everything works deep in lucene by reading the code and playing with it (maybe I should try to read choice checkin logs too) and discovered this inconsistency in a unit test that I was playing with. The advance docs for DISI state that it behaves as written: int advance(int target) { int doc; while ((doc = nextDoc()) < target) { } return doc; } This to me implies that advance(0) should be equivalent to calling nextDoc() in all cases. (And this is how advance behaves in TermScorer and PhraseScorer.) However the real behavior for DSS is more like: while (docID() < target) { if (!next()) return -1; } return docID(); On the other hand, the thread you cite states: > > To me, calling skipTo or advance with the same target multiple times and > > get different result every time is weird. I'd like to change that semantic > But I thought we had just agreed that skipTo(doc) is well defined only for > doc>current? (see "bigger question" point above). > If we all agree that , then we don't need to worry about skipTo(10) being > called twice in a row. So it sounds like I'd be reasonably happy with a resolution of documenting advance in DISI with: "The behavior of calling advance with a target equal to docID() is undefined," and happier with a convincing argument of why this situation never arises within any Lucene scorers in real life. -Gary > off by one: DisjunctionSumScorer::advance > ----------------------------------------- > > Key: LUCENE-2336 > URL: https://issues.apache.org/jira/browse/LUCENE-2336 > Project: Lucene - Java > Issue Type: Bug > Components: Search > Reporter: Gary Yngve > Priority: Minor > Original Estimate: 4h > Remaining Estimate: 4h > > The bug is: > if (target <= currentDoc) { > should be > if (target < currentDoc) { > based on the comments for the method as well as the contract for > DocIdSetIterator: "Advances to the first beyond the current" > It can be demonstrated by: > assertEquals("advance(1) first match failed", 1, > scorer.advance(1)); > assertEquals("advance(1) second match failed", n, > scorer.advance(1)); > if docId: 1 is a hit and n is the next hit. (Tests all pass if this code > change is made.) > I'm not labeling it as major because the class is package-protected and > currently passes spec. > Relevant excerpt: > /** > * Advances to the first match beyond the current whose document number is > * greater than or equal to a given target. <br> > * When this method is used the {...@link #explain(int)} method should not > be > * used. <br> > * The implementation uses the skipTo() method on the subscorers. > * > * @param target > * The target document number. > * @return the document whose number is greater than or equal to the given > * target, or -1 if none exist. > */ > public int advance(int target) throws IOException { > if (scorerDocQueue.size() < minimumNrMatchers) { > return currentDoc = NO_MORE_DOCS; > } > if (target <= currentDoc) { > return currentDoc; > } > do { > if (scorerDocQueue.topDoc() >= target) { > boolean b = advanceAfterCurrent(); > return b ? currentDoc : (currentDoc = NO_MORE_DOCS); > } else if (!scorerDocQueue.topSkipToAndAdjustElsePop(target)) { > if (scorerDocQueue.size() < minimumNrMatchers) { > return currentDoc = NO_MORE_DOCS; > } > } > } while (true); > } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org