I wonder if it would further help for the spell checked to make use of something like WordNet (for English only), where low-frequency words are "double-checked" against WordNet before considered correct.
Otis --- Tom White <[EMAIL PROTECTED]> wrote: > On 8/29/05, Chris Lu <[EMAIL PROTECTED]> wrote: > > > > > > Two approaches I can think of: > > * Use a word list(it may not be the word list you want, but it is > just > > a compromise). > > * Analyze your original index, listing out all words inside. > > > > > Using a word list suffers from two problems: > 1. (Coverage) No word list is complete, they need to be maintained as > new > words are coined, and word lists for different languages vary in > quality. > 2. (Useless suggestions) There is little point in making a suggestion > for > words that aren't in the original index (as they wouldn't produce any > hits). > > For these reasons it is better to use the original index as a source > of > words. It is true that the index will likely contain spelling errors, > > however Lucene Spell Checker provides a way to restrict suggestions > to words > that are more popular than the query term. As misspellings are > typically > rarer than correct spellings this should ensure that misspelled > suggestions > are almost never made. The article quoted above (which I wrote), > provides a > bit more discussion. > > Tom > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]