Am 07.05.2013 23:33, schrieb Marcin Miłkowski:
> Well, for me it seems to be the same issue still, as I haven't been
> given any reason to believe that hunspell expansion would not give me
> a
> compounding mechanism for our speller (beyond the size of the word
> list).
I see no reason other than the size of the list. As every noun can
basically be combined with every other noun, you'll have 30,000^2
combinations if there are 30,000 nouns. And as there are not only
compounds made up of two words, you'd have another 30,000^3 words if you
consider all three-part compounds.
But the way hunspell works can probably be mapped to an FSA. The
hunspell compound tags of the words say:
* this is only a compound beginning, not a stand-alone word ("Arbeits"
in German)
* this is only a compounds part, but not at the beginning (basically
any noun but spelled lowercase, and a lot of other words)
* this is a noun that can both be used stand-alone, but also as a
compound beginning (most nouns in German)
Actually the tags' meaning might be slightly different (didn't look
them up now), but all if this can be, I think, expressed by interpreting
and FSA that's built accordingly and without the need to generate a word
list. A black list of "invalid" words is needed anyway.
I don't have time to dig into this now, but could write test cases etc.
So let me know if I can help with that.
Regards
Daniel
--
http://www.danielnaber.de
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel