Hello Hèctor, Which script are you using for testvoc? It looks like you are not trimming the Catalan monodix, so the script is testing every possible analysis regardless of whether it is in the pair or not. A couple of months ago I began working on a new testvoc script for apertium-eng-cat and apertium-ron-cat based on an old script. My idea was to develop something portable to any pair without hardcoded values, so it stores pair-specific configuration in a configuration file. It needs better error handling to be properly "released", but it mostly works and I am sure you will find it useful. You can find it in the dev/testvoc folder in both pairs. The script checks for generation errors (including every possible translation for polysemic entries using lexical selection) and for double generation (errors in the target monodix). By default, with no options, the script does a full testvoc and generates a summary, but there are three options: -e (ignore <prn><enc>; works faster with romance languages), -q ("quiet"; does not generate summaries) and -u ("unknowns"; checks for entries in the bidix missing from monodixes, uses an external script). It will probably be more than enough for your needs and solve both issues. Regards, Marc El ds. 29 de 06 de 2019 a les 10:00 +0300, en/na Hèctor Alòs i Font va escriure: > I'm having problems with testvoc. There are of two kinds. The main > one is that testvoc generates all forms of the lemmas present in the > monodix, but not only the ones existing in the bilingual dictionary. > This is catastrophic when testing from Catalan, which has tens of > thousands of lemmas which can't be added to the bidix (and often this > is not really needed). For instance for "taula", in apertium-cat-ita: > ^taula# braser<n><f><pl>/@taula# braser<n><f><pl>$ ^.<sent>/.<sent>$ > ^taula# braser<n><f><sg>/@taula# braser<n><f><sg>$ ^.<sent>/.<sent>$ > ^taula# de la Llei<n><f><pl>/@taula# de la Llei<n><f><pl>$ > ^.<sent>/.<sent>$ > ^taula# de la Llei<n><f><sg>/@taula# de la Llei<n><f><sg>$ > ^.<sent>/.<sent>$ > ^taula# de multiplicar<n><f><pl>/@taula# de multiplicar<n><f><pl>$ > ^.<sent>/.<sent>$ > ^taula# de multiplicar<n><f><sg>/@taula# de multiplicar<n><f><sg>$ > ^.<sent>/.<sent>$ > ^taula# de salvació<n><f><pl>/@taula# de salvació<n><f><pl>$ > ^.<sent>/.<sent>$ > ^taula# de salvació<n><f><sg>/@taula# de salvació<n><f><sg>$ > ^.<sent>/.<sent>$ > ^taula# d'harmonia<n><f><pl>/@taula# d'harmonia<n><f><pl>$ > ^.<sent>/.<sent>$ > ^taula# d'harmonia<n><f><sg>/@taula# d'harmonia<n><f><sg>$ > ^.<sent>/.<sent>$ > ^taula<n><f><pl>/tavolo<n><m><pl>/tavola<n><f><pl>/tabella<n><f><pl>$ > ^.<sent>/.<sent>$ > ^taula<n><f><sg>/tavolo<n><m><sg>/tavola<n><f><sg>/tabella<n><f><sg>$ > ^.<sent>/.<sent>$ > ^taula numèrica<n><f><sg>/@taula numèrica<n><f><sg>$ > ^.<sent>/.<sent>$ > ^taula numèrica<n><f><pl>/@taula numèrica<n><f><pl>$ > ^.<sent>/.<sent>$ > ^taula periòdica<n><f><sg>/@taula periòdica<n><f><sg>$ > ^.<sent>/.<sent>$ > ^taula periòdica<n><f><pl>/@taula periòdica<n><f><pl>$ > ^.<sent>/.<sent>$ > > The second problem, is that the script does not include a call to the > lexical selection, so not always the "real" translations are tested, > but one forbidden by the lexical selection. > > I'm solving the second issue (this seems to be trivial), but I'm not > sure how to deal with the first one. Are there any suggestions? > > Best, > Hèctor > > _______________________________________________Apertium-stuff mailing > listapertium-st...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff