[ 
https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733563#action_12733563
 ] 

Michael McCandless commented on LUCENE-1644:
--------------------------------------------

bq. couldn't you consider FILTER versus CONSTANT_BOOLEAN_QUERY an 
implementation detail? could lucene pick / switch over to the best one?

Yeah I struggled with this.  I completely agree it's an impl detail -- the user 
should just have to say "I want constant scoring" and Lucene finds the most 
performant way to achieve it.

But then I realized it's not obvious when one impl should be chosen over 
another.  Often FILTER is faster than CONSTANT_BOOLEAN_QUERY, but at some point 
once the index becomes large enough the underlying O(maxDoc) cost (w/ small 
constant in front) of FILTER will dominate, or if the number of terms/docs that 
match is small then CONSTANT_BOOLEAN_QUERY will win, etc.  If number of terms 
exceeds BooleanQuery's maxClauseCount, you must use FILTER.

And my intuitions weren't right (I had thought NumericRangeQuery, since in 
general it doesn't produce too many terms, would perform well with 
BOOLEAN_QUERY, but from Uwe's numbers that's not the case; though we should 
re-test now that, with this patch, no CPU is spent on scoring).

So, I was uncomfortable trying to make Lucene too smart under the hood, at 
least for this first go at it.

Maybe we could add an AUTO option, that would make try to decide what's best?  
This way if we mess up its smarts, the user can still fallback and force one 
method over another.

(Though, since the maxClauseCount is so clearly a dead-end, maybe even in 
CONSTANT_BOOLEAN_QUERY mode we should forcefully fallback to FILTER on hitting 
too many terms).

> Enable MultiTermQuery's constant score mode to also use BooleanQuery under 
> the hood
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1644
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1644
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1644.patch
>
>
> When MultiTermQuery is used (via one of its subclasses, eg
> WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
> "constant score mode", which pre-builds a filter and then wraps that
> filter as a ConstantScoreQuery.
> If you don't set that, it instead builds a [potentially massive]
> BooleanQuery with one SHOULD clause per term.
> There are some limitations of this approach:
>   * The scores returned by the BooleanQuery are often quite
>     meaningless to the app, so, one should be able to use a
>     BooleanQuery yet get constant scores back.  (Though I vaguely
>     remember at least one example someone raised where the scores were
>     useful...).
>   * The resulting BooleanQuery can easily have too many clauses,
>     throwing an extremely confusing exception to newish users.
>   * It'd be better to have the freedom to pick "build filter up front"
>     vs "build massive BooleanQuery", when constant scoring is enabled,
>     because they have different performance tradeoffs.
>   * In constant score mode, an OpenBitSet is always used, yet for
>     sparse bit sets this does not give good performance.
> I think we could address these issues by giving BooleanQuery a
> constant score mode, then empower MultiTermQuery (when in constant
> score mode) to pick & choose whether to use BooleanQuery vs up-front
> filter, and finally empower MultiTermQuery to pick the best (sparse vs
> dense) bit set impl.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to