There are 2 mechanisms in Hunspell, the one using compound rules and the
one using compound-start , -middle and -end.

Dutch uses both, rules for the easy system of numbers etc, the other one
for regular compounds.

One cannot mix those two methods.

Other languages don't compound at all, most only use start-middle-last for
compounding.

Wordsplitter is a nice option, but much simpler (too simple) than the
compounding options in Hunspell.

I will try to clear it up with some Dutch examples.
auto+eigenaar (car owner) requires to be morphed into auto-eigenaar,
because oe is one sound. There are more of these combinations.
Man is always diminished as mannetje (affix), plural deiminished
mannetjes. This form is alway able to compound, like in mannetjes+putter.
Every compound of two words, always should allow for a - on the word
border; since Dutch creates long compounds (up to 40 chars in actual use),
a dash is always allowed on word borders to allow the writer to make clear
how the word has been composed.

But to be honest, it is nice Hunspell offers these options, but not
essential. Instead of putting a flag for continuation on the affix
-netjes, it is easy to add mannetjes to the list and make that
compounding.

I wonder whether a compounding mechanism having checking rules to forbid
(is this allowed), like Hunspells general approach, is more or less
difficult than the 'uncompounding' approach adding rules to 'allow'.

Ruud

> Hi all,
>
> I want to introduce compounding support for MorfologikSpeller for the
> next LT release. I looked at hunspell dictionaries again and it seems
> that it categorizes words into prefixes, infixes, and suffixes, and
> additionally it has flags to designate words that are allowed only in
> compounds, as well as a flag to designate an affix allowed anywhere (I
> don't have idea what it means practically, COMPOUNDPERMITFLAG).  There a
> flag to prohibit certain compounds as incorrect.
>
> There are also problems with the lower and upper case (KEEPCASE flag as
> well as CHECKCOMPOUNDCASE).
>
> I don't exactly understand the parameter COMPOUNDMIN.
>
> Here's the old idea for us:
>
> http://wiki.languagetool.org/compounding-support-in-morfologikspeller
>
> Basically, I think it will be quite easy to parse the hunspell
> dictionary to get all the words with compounding flags in all their
> forms, so we would be able to convert hunspell dictionaries to FSA
> dictionaries with structured tags. But to do so, I need to understand
> the semantics of the flags in hunspell dictionaries, and hunspell
> documentation is scarce at best. Could anyone please explain it better
> to me? Ruud?
>
> Alternatively, we could leave the speller dictionaries as is and add the
> support for compounding directly in LanguageTool by using JWordSplitter
> to split words but I'm afraid this won't work so nicely as we don't have
> such a library for Dutch, for example.
>
> Regards,
> Marcin
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to