[ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-1644: --------------------------------------- Attachment: LUCENE-1644.patch Attached rough patch -- javadocs are missing/not updated, need to add new tests, need to fix QueryParser.jj, etc., but all tests pass. Here's what I did: - Changed the MTQ.RewriteMethod class from a simple Parameter to its own abstract base class w/ a single method, rewrite, which MultiTermQuery.rewrite delegates to. - Switched over CONSTANT_SCORE_FILTER_REWRITE, SCORING_BOOLEAN_QUERY_REWRITE and CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE. These classes are private (they have no configuration), and I created final static singleton instances for them. - Created ConstantScoreAutoRewrite (and the default CONSTANT_SCORE_AUTO_REWRITE instance) that you can configure based on term count & doc count, as to when it cuts over to CONSTANT_SCORE_FILTER_REWRITE. This approach also has the benefit of allowing customization entirely, if needed, of the "rewrite strategy", if none of the 4 choices work for you. > Enable MultiTermQuery's constant score mode to also use BooleanQuery under > the hood > ----------------------------------------------------------------------------------- > > Key: LUCENE-1644 > URL: https://issues.apache.org/jira/browse/LUCENE-1644 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1644.patch, LUCENE-1644.patch > > > When MultiTermQuery is used (via one of its subclasses, eg > WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use > "constant score mode", which pre-builds a filter and then wraps that > filter as a ConstantScoreQuery. > If you don't set that, it instead builds a [potentially massive] > BooleanQuery with one SHOULD clause per term. > There are some limitations of this approach: > * The scores returned by the BooleanQuery are often quite > meaningless to the app, so, one should be able to use a > BooleanQuery yet get constant scores back. (Though I vaguely > remember at least one example someone raised where the scores were > useful...). > * The resulting BooleanQuery can easily have too many clauses, > throwing an extremely confusing exception to newish users. > * It'd be better to have the freedom to pick "build filter up front" > vs "build massive BooleanQuery", when constant scoring is enabled, > because they have different performance tradeoffs. > * In constant score mode, an OpenBitSet is always used, yet for > sparse bit sets this does not give good performance. > I think we could address these issues by giving BooleanQuery a > constant score mode, then empower MultiTermQuery (when in constant > score mode) to pick & choose whether to use BooleanQuery vs up-front > filter, and finally empower MultiTermQuery to pick the best (sparse vs > dense) bit set impl. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org