Marcin, Below is a quote from the wiki on uncompounding.
This algorithm seems to allow for lots of illegal compounds to be accepted.
Examples: autoonderdeel, Een-Afrika, mannetje-deur
What is needed minimally are checks on the split.
Seems rather complex.


MorfologikSpeller uses a finite word list encoded as an automaton. To
support compounding, we could use the following algorithm:

    If the word is not found on the list, try to decompose it into
building blocks: prefixes, infixes, and suffixes, and other parts.
This can be done by trying to find word parts in a similar way as
replaceRunOnWords, i.e., by moving a space (but with a predefined
maximum of compound words probably > 2 but <= 4).
    We need to mark up incorrect compounds (words commonly mistaked) with
a FORBID tag after a standard separator.
    We need to mark up prefixes (words that cannot occur on any other
position but as a prefix) with a PREFIX tag.
    We need to mark up suffixes with a SUFFIX tag.
    We need to mark up infixes with an INFIX tag.

All non-FORBID words could be used to analyze a compound form. Words with
PREFIX, SUFFIX and INFIX tags would not be proposed by replaceRunOnWords.
To make them appear in suggestions, another instance of the same word
without any tag would be needed (but suppression would still work).


------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to