Xavi Ivars <xavi.iv...@gmail.com> čálii:

> * In the trimming disadvantages number 1, we're stating that we're OK
> having crappy monodixes because we *fix* that later on with trimming. I'm
> sure that's where we are now, but as a project that focuses a lot on
> provided free (as in speech) language resources that are later used for
> many other use cases, I don't feel comfortable with that status. I think we
> should aim to have as correct as possible dictionaries. And if we did that,
> disadvantage number 1 would be smaller (even if not disappearing
> completely).

This point seems like distraction. No one puts errors in monodix on
purpose. We do fix errors in monodix (when we find them, and have
time). When we use monodix for other tasks than MT, we find and fix even
more. On the other hand, there's no point in manually going through
every monodix and bloody well searching for errors because there may be
some that may show up if you stop trimming – please spend your time on
something more useful.

But there may also be some confusion as to what is an error. There may
be things in monodixes that don't belong in "regular" dictionaries, but
do belong in monodix – because the goal is building MT systems, not
Dictionaries.

And if your monodix is to be used for other things than MT, you're just
gonna get many more such "weird" entries that all other use-cases need
to filter out. E.g. Giellatekno's Northern Saami analyser (used for MT,
spelling, grammar check etc.) contains several non-normative analyses,
"multiwords" and unusual taggings just for the grammar checker. These
are not included in the FST's built for other use-cases, but are trimmed
out, mostly using tags (but also bidix, in the case of MT).

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to