Michel Weimerskirch pisze:
Thank you Marcin and Janis for your comments.
I have been rebuilding the wordlist for Luxembourgish from scratch for
the last few months. It will be released in a few weeks. Most of the
words are arranged in separate lists like "adjectives", "nouns", etc.
and the affix rules have been created accordingly.
Thus, adding PoS information *should* be straightforward and it would
save the effort of managing PoS data in another location. I haven't
tried it yet though, because I'm still trying to figure out the best
way to do this.
If you have a good affix file, then give it a try. Note, also, that
Polish is quite an extreme case - this is a Slavonic language, so we
have lots of conjugations, declinations and exceptions - nobody really
knows how many, and schoolbooks lie saying it's 5 or something like that :D
I could try to write a prototype that converts the morphological data
generated by hunspell into a source file for this morph_fsa tool (have
to look into this one too). Maybe something like this:
- "unmunch" my wordlist to get a list of all "possible" words
- generate morphological data using the "analyze" tool
- apply a little awk/sed magic ;-)
This looks like a good idea. All you need for morph_fsa is a tabbed
file, with three tab-separated fields: a base form, an inflected form,
and a POS tag.
If you want to make a grammar checker for Luxembourgish using
LanguageTool as your framework, I can help you with first steps for
doing this.
Regards,
Marcin
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]