FuzzyQuery using termDocs() to reduce count of Boolean Queries

Timo Nentwig Wed, 07 Nov 2007 01:57:52 -0800

Hi!

I asked this one already on the user mailing list but maybe it's more 
appropriate here:


As a simple example imagine every document in your index to have a 
field "language" and "country". A tuple of language+country is what I call a 
context.

You want to search context-specific, i.e. language+country is always part of 
the query (QueryFilter).

FuzzyTermEnum doesn't know about these contexts hence building a BooleanQuery
of all similar terms. E.g. "hello" means "hallo" in german - only one 
character difference. But when searching in context english+USA I don't care 
about german terms. So I don't want/need "hallo" in the BooleanQuery in this 
case.

So I came up with the idea to use reader.termDocs() instead of terms() in 
FuzzyTermEnum. By means of a QueryFilter (it's BitSet respectively) for each 
context I could determine whether a fuzzy term makes sense to be included in 
the BooleanQuery or not.

This results (potentially) in a smaller BooleanQuery but I wonder whether this 
approach will gain any mentionable performance advantage (maybe reduce IO?).

Thanks for feedback
Timo

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

FuzzyQuery using termDocs() to reduce count of Boolean Queries

Reply via email to