[ https://issues.apache.org/jira/browse/LUCENE-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125350#comment-13125350 ]
Doug Cutting commented on LUCENE-3510: -------------------------------------- Using a single bit to track prohibited terms seems reasonable, plus a count for required terms. I don't recall the exact history of the original implementation. I think it may have been in order to support more complex boolean expressions. Any boolean expression can be rewritten to disjunctive normal form, which can then be evaluated with a set of required/prohibited mask pairs, one per conjunctive clause. This is something I'd implemented previously and probably had in mind when implementing BooleanScorer. A Lucene boolean query is effectively a single such conjunctive clause, since the optional terms can be ignored when evaluating the boolean expression, so would reduce to a single pair of masks. But, as you observe, this single clause DNF case can be further simplified to a boolean and a count of required terms. Does that make sense? > BooleanScorer should not limit number of prohibited clauses > ----------------------------------------------------------- > > Key: LUCENE-3510 > URL: https://issues.apache.org/jira/browse/LUCENE-3510 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3510.patch > > > Today it's limited to 32, because it uses a separate bit in the mask > for each clause. > But I don't understand why it does this; I think all prohibited > clauses can share a single boolean/bit? Any match on a prohibited > clause sets this bit and the doc is not collected; we don't need each > prohibited clause to have a dedicated bit? > We also use the mask for required clauses, but this code is now > commented out (we always use BS2 if there are any required clauses); > if we re-enable this code (and I think we should, at least in certain > cases: I suspect it'd be faster than BS2 in many cases), I think we > can cutover to an int count instead of bit masks, and then have no > limit on the required clauses sent to BooleanScorer also. > Separately I cleaned a few things up about BooleanScorer: all of the > embedded scorer methods (nextDoc, docID, advance, score) now throw > UOE; pre-allocate the buckets instead of doing it lazily > per-sub-collect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org