Runde, Kevin wrote:

Hi All,

How does Lucene handle multi term queries? Does it use short circuiting?
So if a user entered:
(a OR b) AND c
But my program knew testing for "c" is cheaper than testing for "(a OR
b)" and I rewrote the query as:
c AND (a OR b)
Would the query run faster?

Sorry if this has already be answered, but for some reason the Archive
search is not working for me today.

Thanks,
Kevin




Not sure about what is in CVS, but look at BooleanQuery.scorer(). If all of the clauses of the BooleanQuery are required and none of the clauses are BooleanQueries a ConjunctionScorer is returned that offers the optimizations you seek. In the example you gave, there is a clause that is boolean ( a or b) that will have to be evaluated independently with a boolean scorer. This will be performed regardless of the ordering. (BooleanScorer doesn't preserve document order when it return results and hence it can't utilize the optimal algorithm provided by ConjuntionScorer). Others have been down this path as evidenced by the sigh in the javadoc.

If calculating (a or b) is expensive and the docFreq of a is much less than the union of a and b, you might consider rewriting it to (a and c) or (b and c) using DeMorgan's law. Expansion like this isn't always beneficial and can't be applied blindly. As far as I can tell there is no query planning/optimization aside from the merging of related clauses and attempts to rewrite to simpler queries.

Cheers,
Todd VanderVeen

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Reply via email to