Instead of "overriding" the trigram approach you may want to do a combination. That is create trigrams out of the list of words from the dictionary and weigh the matches much higher than those coming from the index or even have a first dictionary exact lookup and then a trigram/index based lookup if it fails.
J.D. 2007/7/6, Mathieu Lecarme <[EMAIL PROTECTED]>:
Now, SpellChecker use the trigram algorithm to find similar words. It works well for keyboard fumbles, but not well enough for short words and for languages like french where a same sound can be wrote differently. Spellchecking is a classical computer task, and aspell provides some nice and free (it's GNU) sound dictionary. Lots of dictionary are available. I did a python parser which write translation code in different languages : python, php and java. A bit like snowball stuff. Few works will be done to generate lucene compliant code. But is the python generator is well enough to Lucene, or a translation must be done in Java to put it in Lucene source? I'll start soon a PhonemeSpellChecker wich overide the trigram SpellChecker. Next step is to implement word cutter, just like Google suggest. Any suggestions? M. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]