Xavi Ivars <xavi.iv...@gmail.com> čálii: > * In the trimming disadvantages number 1, we're stating that we're OK > having crappy monodixes because we *fix* that later on with trimming. I'm > sure that's where we are now, but as a project that focuses a lot on > provided free (as in speech) language resources that are later used for > many other use cases, I don't feel comfortable with that status. I think we > should aim to have as correct as possible dictionaries. And if we did that, > disadvantage number 1 would be smaller (even if not disappearing > completely).
This point seems like distraction. No one puts errors in monodix on purpose. We do fix errors in monodix (when we find them, and have time). When we use monodix for other tasks than MT, we find and fix even more. On the other hand, there's no point in manually going through every monodix and bloody well searching for errors because there may be some that may show up if you stop trimming – please spend your time on something more useful. But there may also be some confusion as to what is an error. There may be things in monodixes that don't belong in "regular" dictionaries, but do belong in monodix – because the goal is building MT systems, not Dictionaries. And if your monodix is to be used for other things than MT, you're just gonna get many more such "weird" entries that all other use-cases need to filter out. E.g. Giellatekno's Northern Saami analyser (used for MT, spelling, grammar check etc.) contains several non-normative analyses, "multiwords" and unusual taggings just for the grammar checker. These are not included in the FST's built for other use-cases, but are trimmed out, mostly using tags (but also bidix, in the case of MT).
signature.asc
Description: PGP signature
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff