Hi all,

I want to introduce compounding support for MorfologikSpeller for the 
next LT release. I looked at hunspell dictionaries again and it seems 
that it categorizes words into prefixes, infixes, and suffixes, and 
additionally it has flags to designate words that are allowed only in 
compounds, as well as a flag to designate an affix allowed anywhere (I 
don't have idea what it means practically, COMPOUNDPERMITFLAG).  There a 
flag to prohibit certain compounds as incorrect.

There are also problems with the lower and upper case (KEEPCASE flag as 
well as CHECKCOMPOUNDCASE).

I don't exactly understand the parameter COMPOUNDMIN.

Here's the old idea for us:

http://wiki.languagetool.org/compounding-support-in-morfologikspeller

Basically, I think it will be quite easy to parse the hunspell 
dictionary to get all the words with compounding flags in all their 
forms, so we would be able to convert hunspell dictionaries to FSA 
dictionaries with structured tags. But to do so, I need to understand 
the semantics of the flags in hunspell dictionaries, and hunspell 
documentation is scarce at best. Could anyone please explain it better 
to me? Ruud?

Alternatively, we could leave the speller dictionaries as is and add the 
support for compounding directly in LanguageTool by using JWordSplitter 
to split words but I'm afraid this won't work so nicely as we don't have 
such a library for Dutch, for example.

Regards,
Marcin

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to