[jira] Updated: (LUCENE-1644) Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood

Michael McCandless (JIRA) Thu, 23 Jul 2009 04:41:44 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-1644:
---------------------------------------

    Attachment: LUCENE-1644.patch

New patch attached.  I think it's ready to commit!

I ran a series of simple tests, using a 20.0 million doc Wikipedia
index. I tested w/ PrefixQuery, using different prefixes to tickle the
different number of matching terms vs matching docs.

I first pursued the "matches many terms but few docs" case, and found
at around 350 terms the filter method becomes faster.

Then, I fixed the number of terms at 350 (modified PrefixQuery to
pretend the enum stopped there) and tested different number of "doc
visit counts" (sum of docFreq of each term) and found ~ 0.1% (1/1000)
of the maxDoc() was the cutover.

I also switched NumericRange* to use CONSTANT_SCORE_AUTO by default.


> Enable MultiTermQuery's constant score mode to also use BooleanQuery under 
> the hood
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1644
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1644
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch, 
> LUCENE-1644.patch, LUCENE-1644.patch
>
>
> When MultiTermQuery is used (via one of its subclasses, eg
> WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
> "constant score mode", which pre-builds a filter and then wraps that
> filter as a ConstantScoreQuery.
> If you don't set that, it instead builds a [potentially massive]
> BooleanQuery with one SHOULD clause per term.
> There are some limitations of this approach:
>   * The scores returned by the BooleanQuery are often quite
>     meaningless to the app, so, one should be able to use a
>     BooleanQuery yet get constant scores back.  (Though I vaguely
>     remember at least one example someone raised where the scores were
>     useful...).
>   * The resulting BooleanQuery can easily have too many clauses,
>     throwing an extremely confusing exception to newish users.
>   * It'd be better to have the freedom to pick "build filter up front"
>     vs "build massive BooleanQuery", when constant scoring is enabled,
>     because they have different performance tradeoffs.
>   * In constant score mode, an OpenBitSet is always used, yet for
>     sparse bit sets this does not give good performance.
> I think we could address these issues by giving BooleanQuery a
> constant score mode, then empower MultiTermQuery (when in constant
> score mode) to pick & choose whether to use BooleanQuery vs up-front
> filter, and finally empower MultiTermQuery to pick the best (sparse vs
> dense) bit set impl.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-1644) Enable MultiTermQuery's constant score mode to also use BooleanQuery under the hood

Reply via email to