On Wednesday 07 November 2007 10:51:32 Timo Nentwig wrote: > Hi! > > I asked this one already on the user mailing list but maybe it's more > appropriate here: > > As a simple example imagine every document in your index to have a > field "language" and "country". A tuple of language+country is what I call > a context. > > You want to search context-specific, i.e. language+country is always part > of the query (QueryFilter). > > FuzzyTermEnum doesn't know about these contexts hence building a > BooleanQuery of all similar terms. E.g. "hello" means "hallo" in german - > only one character difference. But when searching in context english+USA I > don't care about german terms. So I don't want/need "hallo" in the > BooleanQuery in this case. > > So I came up with the idea to use reader.termDocs() instead of terms() in > FuzzyTermEnum. By means of a QueryFilter (it's BitSet respectively) for
Well...I didn't read to carefully, termDocs(Term) "returns an enumeration of all the documents which contain term". So for each terms() term I had to termDocs(). This will probably tear down performance more than this optimization will gain :-\ > each context I could determine whether a fuzzy term makes sense to be > included in the BooleanQuery or not. > > This results (potentially) in a smaller BooleanQuery but I wonder whether > this approach will gain any mentionable performance advantage (maybe reduce > IO?). > > Thanks for feedback > Timo > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
