Re: FuzzyQuery using termDocs() to reduce count of Boolean Queries

Timo Nentwig Wed, 07 Nov 2007 08:40:47 -0800

On Wednesday 07 November 2007 10:51:32 Timo Nentwig wrote:
> Hi!
>
> I asked this one already on the user mailing list but maybe it's more
> appropriate here:
>
> As a simple example imagine every document in your index to have a
> field "language" and "country". A tuple of language+country is what I call
> a context.
>
> You want to search context-specific, i.e. language+country is always part
> of the query (QueryFilter).
>
> FuzzyTermEnum doesn't know about these contexts hence building a
> BooleanQuery of all similar terms. E.g. "hello" means "hallo" in german -
> only one character difference. But when searching in context english+USA I
> don't care about german terms. So I don't want/need "hallo" in the
> BooleanQuery in this case.
>
> So I came up with the idea to use reader.termDocs() instead of terms() in
> FuzzyTermEnum. By means of a QueryFilter (it's BitSet respectively) for


Well...I didn't read to carefully, termDocs(Term) "returns an enumeration of 
all the documents which contain term". So for each terms() term I had to 
termDocs(). This will probably tear down performance more than this 
optimization will gain :-\

> each context I could determine whether a fuzzy term makes sense to be
> included in the BooleanQuery or not.
>
> This results (potentially) in a smaller BooleanQuery but I wonder whether
> this approach will gain any mentionable performance advantage (maybe reduce
> IO?).
>
> Thanks for feedback
> Timo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: FuzzyQuery using termDocs() to reduce count of Boolean Queries

Reply via email to