Thanks for clearing up spell checking for me. I had an idea of how it worked but no real specifics. This along with the link to the article on affix file format (http://lingucomponent.openoffice.org/affix.readme) were possibly the most important posts recently. I'd really like another article/tutorial on the subject and I know they exist, but can't find them. Is there a site index for the lingucomponent page?

Kevin B. Hendricks wrote:

At a basic level, a spelling checker simply takes an unknown word and look it up in a list of commonly used words (all correctly spelled). Unfortunately for many languages the list of commonly used words is simply too large to be searched or accessed easily with a reasonable memory footprint and access speed. Luckily many of those same languages use prefixes and or suffixes (sometimes in combination) on a much smaller list of root words to create many of its commonly used words.
So all an .aff file is used for is to identify some of the most commonly used prefixes and suffixes so that a much smaller set of root words with affix flags can be used to effectively store a much longer list of commonly used words.
That is the whole concept behind ispell which myspell has tried to adopt.
It actually does not matter what adding a prefix or a suffix actually does to the root word (that would be the domain of a grammar checker) as long as a correctly spelled new word is made from a correctly spelled root word and its defined affixes.
The way to use munch is to take a long long list of commonly used but correctly spelled words (call this the language's "working set") and then using the identified prefixes and suffixes from the .aff file to identify and properly compress that "working set" into a new shorter list of correctly spelled root words with affix flags (a .dic file). unmunch can then be used on that .dic file and .aff file to recreate the exact same "working set" of commonly used words with NO additional words created.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to