Doug Cutting wrote:
Does anyone have fuzzy-query benchmarks for, e.g., ~1M document indexes, where each document contains a few k of text? Ideally with such indexes, even complex queries should take less than a second, no? How long does a fuzzy query take? And how much does a prefix of zero, one, or two change that? Queries that take much longer than a second are considerably less usable. I think the the default should provide good usability for indexes of at least 1M documents.
i've an index containing about 800k documents, a few kb of text for each document. Every lucene doc in the index has about 12 fields. The overall index size is about 2.8 GB.
Another thing to examine is how different the generated terms are with different prefixes. One could randomly select some words from an index and compute the average amount that a prefix of one and two changes the end results. My guess is that the changes are small. Since fuzzy search is a heuristic, not an exact computation, good approximations are fair play.If that fits you're need, i can create and run a test for query benchmarking.
regards Bernhard
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]