Thank you Marcin and Janis for your comments.

I have been rebuilding the wordlist for Luxembourgish from scratch for
the last few months. It will be released in a few weeks. Most of the
words are arranged in separate lists like "adjectives", "nouns", etc.
and the affix rules have been created accordingly.

Thus, adding PoS information *should* be straightforward and it would
save the effort of managing PoS data in another location. I haven't
tried it yet though, because I'm still trying to figure out the best
way to do this.

I could try to write a prototype that converts the morphological data
generated by hunspell into a source file for this morph_fsa tool (have
to look into this one too). Maybe something like this:
- "unmunch" my wordlist to get a list of all "possible" words
- generate morphological data using the "analyze" tool
- apply a little awk/sed magic ;-)

Regards,
Michel
http://michel.weimerskirch.net/

On 5/25/08, Jancs <[EMAIL PROTECTED]> wrote:
> Quoting Marcin Miłkowski <[EMAIL PROTECTED]>:
>
>
> > I'm still planning to start a major rewrite of affix flag / tagging
> > rules as the Polish hunspell source has been significantly cleared up
> > (it contained many duplicates in terms of flags creating the same PoS
> > tag and the same affix) - the current dictionary is imperfect,
> > especially for accusative case.
> >
>
>  The same problem is for Latvian also, with a heaviest cases being changes
> of the word's part of speech (I hope that's the right name to descibe the
> noun becoming verb and vice versa in some cases) - it seems impossible to
> eradicate such duplicates.
>
>  Janis
>
> ----------------------------------------------------------------
>  This message was sent using IMP, the Internet Messaging Program.
>
>
>
>
> ---------------------------------------------------------------------
>  To unsubscribe, e-mail:
> [EMAIL PROTECTED]
>  For additional commands, e-mail:
> [EMAIL PROTECTED]
>
>

Reply via email to