Hello, I'm working with the monolingual transfer rule learning code and have a few questions:
1. I see some language pairs used to have a multi mode (such as in this old version of eng-cat <https://github.com/apertium/apertium-eng-cat/blob/6a2f3e7082a9d44a478f76f6c60a526e02512e93/modes.xml#L74>). They also used to have "poly" dictionaries (such as this one <https://github.com/apertium/apertium-eng-cat/blob/6a2f3e7082a9d44a478f76f6c60a526e02512e93/apertium-en-ca.en-ca-poly.dix>). These files seem necessary for the latest monolingual rule learning script I've found <http://wiki.apertium.org/wiki/Generating_lexical-selection_rules_from_monolingual_corpora>. Why do language pairs no longer have a multi mode or poly dictionaries? 2. Is there a script that can generate a poly dictionary from a bilingual dictionary? 3. The third step in the monolingual rule learning script I linked above says this should be ran: cat europarl.en-es.es.tagged | ~/source/apertium-lex-tools/multitrans ~/source/apertium-en-es/en-es.autobil -m -f -t -n > europarl.en-es.es.multi-trimmed I was trying to do this step with the apertium-en-pt language pair using 10% of the English-Portuguese Europarl corpus. I stopped the program because the output file was getting really big (dozens of gigabytes). Is this expected behavior from ./multitrans with the -m option? If so, how are the English-Spanish Europarl examples run? Thank you, Danielle
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff