I agree that the default should stay 0, even for Lucene 2.0.
It should certainly stay zero for 1.4.x releases.
However 2.0 is our opportunity to make incompatible changes. What is the best default for this, that will work well for the most applications?
Does anyone have fuzzy-query benchmarks for, e.g., ~1M document indexes, where each document contains a few k of text? Ideally with such indexes, even complex queries should take less than a second, no? How long does a fuzzy query take? And how much does a prefix of zero, one, or two change that? Queries that take much longer than a second are considerably less usable. I think the the default should provide good usability for indexes of at least 1M documents.
Another thing to examine is how different the generated terms are with different prefixes. One could randomly select some words from an index and compute the average amount that a prefix of one and two changes the end results. My guess is that the changes are small. Since fuzzy search is a heuristic, not an exact computation, good approximations are fair play.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]