Michel Weimerskirch pisze:
Thank you Marcin and Janis for your comments.

I have been rebuilding the wordlist for Luxembourgish from scratch for
the last few months. It will be released in a few weeks. Most of the
words are arranged in separate lists like "adjectives", "nouns", etc.
and the affix rules have been created accordingly.

Thus, adding PoS information *should* be straightforward and it would
save the effort of managing PoS data in another location. I haven't
tried it yet though, because I'm still trying to figure out the best
way to do this.

If you have a good affix file, then give it a try. Note, also, that Polish is quite an extreme case - this is a Slavonic language, so we have lots of conjugations, declinations and exceptions - nobody really knows how many, and schoolbooks lie saying it's 5 or something like that :D

I could try to write a prototype that converts the morphological data
generated by hunspell into a source file for this morph_fsa tool (have
to look into this one too). Maybe something like this:
- "unmunch" my wordlist to get a list of all "possible" words
- generate morphological data using the "analyze" tool
- apply a little awk/sed magic ;-)

This looks like a good idea. All you need for morph_fsa is a tabbed file, with three tab-separated fields: a base form, an inflected form, and a POS tag.

If you want to make a grammar checker for Luxembourgish using LanguageTool as your framework, I can help you with first steps for doing this.

Regards,
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to