It would be great to get some repeatable tests for this type of thing into the benchmark contrib. I had started work on that sometime back, but I don't think I have it around anymore.
On Tue, Jul 21, 2009 at 12:14 PM, Robert Muir (JIRA) <[email protected]>wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733676#action_12733676] > > Robert Muir edited comment on LUCENE-1644 at 7/21/09 9:14 AM: > -------------------------------------------------------------- > > Mike, I am afraid that might hurt some people's performance. > I'm a bit concerned my index/queries are maybe abnormal and don't want to > break the general case. > > I'm not too familiar with trie [what it would do with a really general > range query], but a simpler example would be no stopwords, wildcard query of > "th?" (matching "the") > maybe it only matches one term, but that term is very common / dense bitset > and probably "hot". > > In this case the filter would be better, even though its 1 term. > > was (Author: rcmuir): > Mike, I am afraid that might hurt some people's performance. > I'm a bit concerned my index/queries are maybe abnormal and don't want to > break the general case. > > I'm not too familiar with trie [what it would do with a really general > range query], but a simpler example would be no stopwords, wildcard query of > th* > maybe it only matches one term, but that term is very common / dense bitset > and probably "hot". > > In this case the filter would be better, even though its 1 term. > > > Enable MultiTermQuery's constant score mode to also use BooleanQuery > under the hood > > > ----------------------------------------------------------------------------------- > > > > Key: LUCENE-1644 > > URL: https://issues.apache.org/jira/browse/LUCENE-1644 > > Project: Lucene - Java > > Issue Type: Improvement > > Components: Search > > Reporter: Michael McCandless > > Assignee: Michael McCandless > > Priority: Minor > > Fix For: 2.9 > > > > Attachments: LUCENE-1644.patch > > > > > > When MultiTermQuery is used (via one of its subclasses, eg > > WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use > > "constant score mode", which pre-builds a filter and then wraps that > > filter as a ConstantScoreQuery. > > If you don't set that, it instead builds a [potentially massive] > > BooleanQuery with one SHOULD clause per term. > > There are some limitations of this approach: > > * The scores returned by the BooleanQuery are often quite > > meaningless to the app, so, one should be able to use a > > BooleanQuery yet get constant scores back. (Though I vaguely > > remember at least one example someone raised where the scores were > > useful...). > > * The resulting BooleanQuery can easily have too many clauses, > > throwing an extremely confusing exception to newish users. > > * It'd be better to have the freedom to pick "build filter up front" > > vs "build massive BooleanQuery", when constant scoring is enabled, > > because they have different performance tradeoffs. > > * In constant score mode, an OpenBitSet is always used, yet for > > sparse bit sets this does not give good performance. > > I think we could address these issues by giving BooleanQuery a > > constant score mode, then empower MultiTermQuery (when in constant > > score mode) to pick & choose whether to use BooleanQuery vs up-front > > filter, and finally empower MultiTermQuery to pick the best (sparse vs > > dense) bit set impl. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- -- - Mark http://www.lucidimagination.com
