Thank you Marcin and Janis for your comments. I have been rebuilding the wordlist for Luxembourgish from scratch for the last few months. It will be released in a few weeks. Most of the words are arranged in separate lists like "adjectives", "nouns", etc. and the affix rules have been created accordingly.
Thus, adding PoS information *should* be straightforward and it would save the effort of managing PoS data in another location. I haven't tried it yet though, because I'm still trying to figure out the best way to do this. I could try to write a prototype that converts the morphological data generated by hunspell into a source file for this morph_fsa tool (have to look into this one too). Maybe something like this: - "unmunch" my wordlist to get a list of all "possible" words - generate morphological data using the "analyze" tool - apply a little awk/sed magic ;-) Regards, Michel http://michel.weimerskirch.net/ On 5/25/08, Jancs <[EMAIL PROTECTED]> wrote: > Quoting Marcin Miłkowski <[EMAIL PROTECTED]>: > > > > I'm still planning to start a major rewrite of affix flag / tagging > > rules as the Polish hunspell source has been significantly cleared up > > (it contained many duplicates in terms of flags creating the same PoS > > tag and the same affix) - the current dictionary is imperfect, > > especially for accusative case. > > > > The same problem is for Latvian also, with a heaviest cases being changes > of the word's part of speech (I hope that's the right name to descibe the > noun becoming verb and vice versa in some cases) - it seems impossible to > eradicate such duplicates. > > Janis > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > >
