Jancs pisze:
Quoting Marcin Miłkowski <[EMAIL PROTECTED]>:

I'm still planning to start a major rewrite of affix flag / tagging
rules as the Polish hunspell source has been significantly cleared up
(it contained many duplicates in terms of flags creating the same PoS
tag and the same affix) - the current dictionary is imperfect,
especially for accusative case.

The same problem is for Latvian also, with a heaviest cases being changes of the word's part of speech (I hope that's the right name to descibe the noun becoming verb and vice versa in some cases) - it seems impossible to eradicate such duplicates.

Duplicates are OK as long as the tagger should display ambiguity in individual words (which can be then disambiguated in some contexts). Anyway, the problem with Polish is that affix flags are being used currently inconsistently in terms of part of speech tags - but they get unambiguous if you map all affix flags, a given affix flag and the word ending to the PoS tag. Hunspell maps only _one_ affix flag and the word ending to the PoS tag, so the word would become feminine in accusative, and neutral in genitive etc., which is pretty useless. The whole set of affix flags + the current affix flag + all word transformations (deletions, adding endings etc.) usually are enough. If hunspell could support such a mapping, I could dump my scripts.

Regards,
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to