Thanks for clearing up spell checking for me. I had an idea of how it
worked but no real specifics. This along with the link to the article on
affix file format (http://lingucomponent.openoffice.org/affix.readme)
were possibly the most important posts recently. I'd really like another
article/tutorial on the subject and I know they exist, but can't find
them. Is there a site index for the lingucomponent page?
Kevin B. Hendricks wrote:
At a basic level, a spelling checker simply takes an unknown word and
look it up in a list of commonly used words (all correctly spelled).
Unfortunately for many languages the list of commonly used words is
simply too large to be searched or accessed easily with a reasonable
memory footprint and access speed. Luckily many of those same
languages use prefixes and or suffixes (sometimes in combination) on a
much smaller list of root words to create many of its commonly used
words.
So all an .aff file is used for is to identify some of the most
commonly used prefixes and suffixes so that a much smaller set of root
words with affix flags can be used to effectively store a much longer
list of commonly used words.
That is the whole concept behind ispell which myspell has tried to adopt.
It actually does not matter what adding a prefix or a suffix actually
does to the root word (that would be the domain of a grammar checker)
as long as a correctly spelled new word is made from a correctly
spelled root word and its defined affixes.
The way to use munch is to take a long long list of commonly used but
correctly spelled words (call this the language's "working set") and
then using the identified prefixes and suffixes from the .aff file to
identify and properly compress that "working set" into a new shorter
list of correctly spelled root words with affix flags (a .dic file).
unmunch can then be used on that .dic file and .aff file to recreate
the exact same "working set" of commonly used words with NO additional
words created.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]