Jancs pisze:
Quoting Marcin Miłkowski <[EMAIL PROTECTED]>:
I'm still planning to start a major rewrite of affix flag / tagging
rules as the Polish hunspell source has been significantly cleared up
(it contained many duplicates in terms of flags creating the same PoS
tag and the same affix) - the current dictionary is imperfect,
especially for accusative case.
The same problem is for Latvian also, with a heaviest cases being
changes of the word's part of speech (I hope that's the right name to
descibe the noun becoming verb and vice versa in some cases) - it seems
impossible to eradicate such duplicates.
Duplicates are OK as long as the tagger should display ambiguity in
individual words (which can be then disambiguated in some contexts).
Anyway, the problem with Polish is that affix flags are being used
currently inconsistently in terms of part of speech tags - but they get
unambiguous if you map all affix flags, a given affix flag and the word
ending to the PoS tag. Hunspell maps only _one_ affix flag and the word
ending to the PoS tag, so the word would become feminine in accusative,
and neutral in genitive etc., which is pretty useless. The whole set of
affix flags + the current affix flag + all word transformations
(deletions, adding endings etc.) usually are enough. If hunspell could
support such a mapping, I could dump my scripts.
Regards,
Marcin
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]