Hi all.

I am using lucene-1.3-final and have performance problems with fuzzy queries.

If I understand right to perform fuzzy query lucene enumerate all terms in the index and construct BooleanQuery which consists of simple TermQueries.

The main problem is that this process is performing several times during search significantly decreasing performance.

Let me explain.

For example my search returns 1000 documents. In my application I need to get all this documents from index for later processing, but lucene rereads every 100 documents all terms in the index, because by default we get only 100 documents from index (see Hits class) and when I access 101st document the search process is performed again and absolutely unnecessary operation of creating FilteredTermEnum is performed.

Unfortunately I can't say to Hits class to get all my 1000 document initially, because value 100 (actually 50 in the code) is hard coded. So I think that this value should be configurable in Searcher.

Actually I have performed some research and found that if I get all my 1000 documents at the first stage the speed of my fuzzy query increased by 4 times!

Also please could you give me some advices on increasing performance of fuzzy search.

One approach that I already use in our application is custom fuzzy query that compares word prefix (f.e. the first 3 symbols) of terms and only if they equals than it tries to compare the rest of terms using the same algorithm that is used in FuzzyQuery.

Best regards,
Konstantin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to